layout: true <!-- this adds the link footer to all slides, depends on footer-small class in css--> <div class="footer-small"><span>https://github.com/mjfrigaard/ph-lacounty-r/</div> --- name: title-slide class: title-slide, center, middle, inverse # R Markdown Visualizations #.fancy[Creating Graphs in R Markdown] <br> .large[by Martin Frigaard] Written: October 03 2022 Updated: December 15 2022 .footer-large[.right[.fira[ <br><br><br><br><br>[Created using the "λέξις" theme](https://jhelvy.github.io/lexis/index.html#what-does-%CE%BB%CE%AD%CE%BE%CE%B9%CF%82-mean) ]]] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Materials The slides are in the `slides.pdf` file -- The materials for this training are in the `worksheets` folder: ``` worksheets ├── import.Rmd ├── export.Rmd ├── objects.Rmd ├── rmd-basic.Rmd ├── rmd-tables.Rmd └── rmd-visualizations.Rmd ``` --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Outline <br> .leftcol[ #### 1. Importing data #### 2. Common Data Objects #### 3. R Markdown ] -- .rightcol[ #### 4. .red[R Markdown Data Visualizations] #### 5. R Markdown Tables #### 6. Exporting Data ] --- background-image: url(www/pdg-hex.png) class: center, middle, inverse background-position: 96% 4% background-size: 6% # .large[R Markdown Data Visualizations] -- <br><br> .font90[.green[Open `rmd-visualizations.Rmd` to follow along]] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # R Markdown Data Visualizations The `NHANES` package comes with data from the [2014 American National Health and Nutrition Examination surveys](http://www.cdc.gov/nchs/data/series/sr_02/sr02_162.pdf). We will load a sample from it below: ```r library(NHANES) SmallNhanes <- NHANES |> select(ID, Gender, Age, AgeDecade, Race1, HealthGen, Height, BMI, Weight, Pulse, BPSysAve) ``` --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ### Quick Tip: Column Names Standardize names with `janitor::clean_names()` ```r SmallNhanes <- SmallNhanes |> janitor::clean_names() glimpse(SmallNhanes) ``` .code50[ ``` Rows: 10,000 Columns: 11 $ id <int> 51624, 51624, 51624, 51625, 51630, 51638, 51646, 51647, 51647, 51647, 51654, 51… $ gender <fct> male, male, male, male, female, male, male, female, female, female, male, male,… $ age <int> 34, 34, 34, 4, 49, 9, 8, 45, 45, 45, 66, 58, 54, 10, 58, 50, 9, 33, 60, 16, 56,… $ age_decade <fct> 30-39, 30-39, 30-39, 0-9, 40-49, 0-9, 0-9, 40-49, 40-49, 40-49, 60-6… $ race1 <fct> White, White, White, Other, White, White, White, White, White, White, White, Wh… $ health_gen <fct> Good, Good, Good, NA, Good, NA, NA, Vgood, Vgood, Vgood, Vgood, Vgood, Fair, NA… $ height <dbl> 164.7, 164.7, 164.7, 105.4, 168.4, 133.1, 130.6, 166.7, 166.7, 166.7, 169.5, 18… $ bmi <dbl> 32.22, 32.22, 32.22, 15.30, 30.57, 16.82, 20.64, 27.24, 27.24, 27.24, 23.67, 23… $ weight <dbl> 87.4, 87.4, 87.4, 17.0, 86.7, 29.8, 35.2, 75.7, 75.7, 75.7, 68.0, 78.4, 74.7, 3… $ pulse <int> 70, 70, 70, NA, 86, 82, 72, 62, 62, 62, 60, 62, 76, 80, 94, 74, 92, 96, 84, 76,… $ bp_sys_ave <int> 113, 113, 113, NA, 112, 86, 107, 118, 118, 118, 111, 104, 134, 104, 127, 142, 9… ``` ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ### Formating factors We have a `health_gen` variable with the following levels: Excellent, Vgood, Good, Fair, or Poor. These are ordered. -- ```r SmallNhanes <- SmallNhanes |> mutate(health_gen = factor(x = health_gen, levels = c("Poor", "Fair", "Good", "Vgood", "Excellent"), ordered = TRUE)) ``` -- ```r levels(SmallNhanes$health_gen) ``` ``` [1] "Poor" "Fair" "Good" "Vgood" "Excellent" ``` --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # `ggplot2` ### The *.red[Layered]* grammar of graphics -- <br> How it works: 1) Graphs are _initialized_ with `ggplot()` -- 2) Variables are _mapped_ to aesthetics -- 3) Geoms are linked to _statistics_ --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% <br><br><br><br><br> .large[*What relationship do we expect to see between height and weight?*] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% #### .font90[1) Use data with pipe to initialize graph] .font80[ `SmallNhanes |>` ] -- #### .font90[2) Map variables to aesthetics] .font80[ `SmallNhanes |>` `ggplot(mapping = aes(x = weight, y = height))` ] -- #### .font90[3) Add geoms and layers] .font80[ `SmallNhanes |>` `ggplot(mapping = aes(x = weight, y = height)) +` `geom_point()` ] --- background-image: url(www/pdg-hex.png) background-position: 4% 96% background-size: 6% .border[ ```r SmallNhanes %>% * ggplot() # initialize ``` <img src="www/initialize-1.png" width="80%" height="80%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 4% 96% background-size: 6% .border[ ```r SmallNhanes %>% * ggplot(mapping = aes(x = weight, y = height)) # map variables ``` <img src="www/mapping-1.png" width="80%" height="80%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 4% 96% background-size: 6% .border[ ```r SmallNhanes %>% ggplot(mapping = aes(x = weight, y = height)) + * geom_point() # add geoms ``` <img src="www/geoms-1.png" width="80%" height="80%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # `ggplot2` template #### Initialize the plot the `ggplot()`, map the aesthetics, and add a `<GEOM_FUNCTION>` ```r <DATA> %>% ggplot(mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>() ``` -- #### We can add more aesthetics *inside* geoms ```r <DATA> %>% ggplot(mapping = aes(<MAPPINGS>)) + * <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) ``` --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # `ggplot2` template #### Because `ggplot2` is a language of layers, we can continue adding *more* geoms ```r <DATA> %>% ggplot(mapping = aes(<MAPPINGS>)) + * <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) + * <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) ``` #### Note the different syntax (.red[%>%] vs. .red[+]) ```r <DATA> %>% #<< pipe! ggplot(mapping = aes(<MAPPINGS>)) + #<< plus! <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) ``` --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Aesthetics #### Is the relationship between `weight` and `height` the same for both `gender`s? -- *We can explore this by mapping the variables to different aesthetics* -- #### Aesthetics as graph elements (`color`, `size`, `shape`, and `alpha`) .border[ <img src="www/graph-elements.png" width="80%" height="80%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Global `ggplot2` mapping ### ***inside the `ggplot()` function*** = setting variables ***globally*** <img src="www/ggplot2-template-01.png" width="90%" height="90%" /> --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Local `ggplot2` mapping ### ***inside the `geom()` function*** = setting variables ***locally*** <img src="www/ggplot2-template-02.png" width="85%" height="85%" /> --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Your Turn ### Set local vs. global aesthetic mappings .leftcol[ *From here...* ```r SmallNhanes %>% ggplot( * mapping = * aes(x = weight, y = height)) + geom_point() + geom_smooth() ``` ] -- .rightcol[ *...to here.* ```r SmallNhanes %>% ggplot() + geom_point( * mapping = * aes(x = weight, y = height)) + geom_smooth( * mapping = * aes(x = weight, y = height)) ``` ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Your Turn (solution 1) .border[ ```r SmallNhanes %>% * ggplot(mapping = aes(x = weight, y = height)) + geom_point() + geom_smooth() ``` <img src="www/aes-in-ggplot2-sol-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Your Turn (solution 2) .border[ ```r SmallNhanes %>% ggplot() + * geom_point(mapping = aes(x = weight, y = height)) + * geom_smooth(mapping = aes(x = weight, y = height)) ``` <img src="www/aes-in-geom-sol-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Variables, Aestheitcs, and Geoms --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Variables, Aestheitcs, and Geoms (1) Each graph needs a variable or value, an aesthetic, and geom (the accompanying graphic, geometry) -- ```r *geom_point(mapping = aes(x = weight, y = height)) + # layer 1 *geom_smooth(mapping = aes(x = weight, y = height)) # layer 2 ``` -- | variable | aesthetic | geom | |:---------:|:-------------:|:----------------:| | `weight` | position = `x`| dots = `point` | | `height` | position = `y`| dots = `point` | | `weight` | position = `x`| line = `smooth` | | `height` | position = `y`| line = `smooth` | -- These have the same aesthetics! What if we added a layer with a variable mapped to a different aesthetic? --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Variables, Aestheitcs, and Geoms (2) But we can add *more* variables, map them to *different* aesthetics, and *adding* another `geom` layer -- Add another layer, coloring the points by `gender` ```r SmallNhanes %>% ggplot() + * geom_point(mapping = aes(x = weight, y = height)) + * geom_point(mapping = aes(color = gender)) ``` -- | variable | aesthetic | geom | |:---------:|:--------------:|:----------------:| | `weight` | position = `x` | dots = `point` | | `height` | position = `y` | dots = `point` | | `gender` | color = `color`| dots = `point` | --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ### Variables, Aestheitcs, and Geoms (3) .leftcol55[ #### ERROR! ```r SmallNhanes %>% ggplot() + geom_point( * aes(x = weight, y = height)) + geom_point( * aes(color = gender)) ``` ```r # Error: geom_point requires the following missing aesthetics: x and y ``` ] .rightcol45[ #### SOLUTION All `geom`s have required aesthetics--map variables globally ```r SmallNhanes %>% ggplot( * aes(x = weight, y = height)) + * geom_point(aes(color = gender)) ``` ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ### Aesthetics: color .border[ ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + * geom_point(aes(color = gender)) ``` <img src="www/color-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ### Aesthetics: size .border[ ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + * geom_point(aes(color = gender, size = gender)) ``` <img src="www/size-point-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ### Aesthetics: shape .border[ ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + * geom_point(aes(color = gender, size = gender, shape = gender)) ``` <img src="www/shape-point-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ### Aesthetics: alpha (opacity) .border[ ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + * geom_point(aes(color = gender, alpha = gender)) ``` <img src="www/alpha-point-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Aesthetic mappings -- .pull-left[ #### Legend is automatically included #### Continuous variables best with `size` ] -- .pull-right[.border[ <img src="www/aes-settings.png" width="100%" height="100%" style="display: block; margin: auto;" /> ]] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ## Setting values vs. mapping variables ### How can we create this plot? .border[ <img src="www/red-points-1-1.png" width="70%" height="70%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ### Inside `aes()` .border[ ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + * geom_point(aes(color = "red")) # inside aes ``` <img src="www/inside-aes-no-eval-1.png" width="65%" height="65%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ### Outside `aes()` .border[ ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + * geom_point(color = "red") # outside aes ``` <img src="www/red-points-1.png" width="65%" height="65%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ## What happened? `aes()` expected a variable, not a value (`"red"`). ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + * geom_point(aes(color = "red")) # "value" in aes ``` -- .border[ <img src="www/inside-aes-no-eval-2-1.png" width="50%" height="50%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Geoms (geometric objects) --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ## Geoms -- ### These are visual elements used to represent the data of the graph -- ### Examples include: - `geom_boxplot` - `geom_col` - `geom_line` - `geom_smooth` -- ### See the cheatsheet for more examples: https://bit.ly/ggplot2-cheat --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ## Your Turn -- ### *How does BMI vary across levels of self-reported general health?* -- ### Complete the code below: Map the variables locally inside the `geom_boxplot()` function ```r SmallNhanes %>% ggplot() %>% geom_boxplot(mapping = aes(x = __________, y = ___)) ``` --- background-image: url(www/pdg-hex.png) background-position: 4% 96% background-size: 6% ```r SmallNhanes %>% ggplot() + * geom_boxplot(mapping = aes(x = health_gen, y = bmi)) ``` -- #### Box-plots are great for seeing how a continuous variable varies across a categorical variable .border[ <img src="www/box-plot-show-1-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ## Your Turn <br> -- ### Fill in the code below to change the colors in the boxplot for each level of `health_gen` ```r SmallNhanes %>% ggplot() + geom_boxplot( * aes(x = health_gen, y = bmi, _____ = health_gen)) ``` --- background-image: url(www/pdg-hex.png) background-position: 4% 96% background-size: 6% ```r SmallNhanes %>% ggplot() + geom_boxplot( * aes(x = health_gen, y = bmi, color = health_gen)) ``` -- .border[ *Color is not the setting we want here...* <img src="www/box-plot-color-1-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 4% 96% background-size: 6% ```r SmallNhanes %>% ggplot() + geom_boxplot( * aes(x = health_gen, y = bmi, fill = health_gen)) ``` -- .border[ *Fill is better* <img src="www/box-plot-fill-1-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Adding layers -- ### The 'infinitely extensible' part of `ggplot2` is where we start to really see it's power -- ### Consider the relationship between `height` and `weight` again --- background-image: url(www/pdg-hex.png) background-position: 4% 96% background-size: 6% ```r SmallNhanes %>% * ggplot(aes(x = weight, y = height)) + # global geom_point(aes(color = gender)) ``` -- .border[ <img src="www/layer-1-plot-1-1.png" width="70%" height="70%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 4% 96% background-size: 6% ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + geom_point(aes(color = gender)) + * geom_smooth(data = # data 2 * filter(SmallNhanes, gender == "male"), # layer 2 aes(x = weight, y = height), color = "blue") ``` -- .border[ <img src="www/layer-2-plot-1-1.png" width="64%" height="64%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 4% 96% background-size: 6% ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + geom_point(aes(color = gender)) + geom_smooth(data = filter(SmallNhanes, gender == "male"), aes(x = weight, y = height), color = "blue") + * geom_smooth(data = # data 3 * filter(SmallNhanes, gender == "female"), # layer 3 aes(x = weight, y = height), color = "red") ``` -- .border[ <img src="www/layer-3-plot-1-1.png" width="48%" height="18%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Facets --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ## Faceting ### Facet layers display subplots for levels of categorical variables <br> | Facet layer | Display | |:-----------------------------|:-------------------------------------------| | `facet_wrap(. ~ gender)` | Plot for each level of `gender` | | `facet_wrap(race1 ~ gender)` | Plot for each level of `gender` and `race` | | `facet_wrap(. ~ gender, ncol = 1)` | Specify the number of columns | | `facet_wrap(. ~ gender, nrow = 1)` | Specify the number of rows | --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ## Facet Single Variable ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + geom_point(aes(color = gender)) + * facet_wrap(. ~ gender) ``` .border[ <img src="www/facet_wrap-1-1.png" width="52%" height="35%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ## Facet Two Variables ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + geom_point(aes(color = gender)) + * facet_wrap(race1 ~ gender) ``` .border[ <img src="www/facet_wrap-2vars-1-1.png" width="52%" height="42%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ## Facet: Set Columns ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + geom_point(aes(color = gender)) + * facet_wrap(race1 ~ gender, ncol = 5) ``` .border[ <img src="www/facet_wrap-cols-1-1.png" width="52%" height="42%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% ## Facet: Set Rows ```r SmallNhanes %>% ggplot(aes(x = weight, y = height)) + geom_point(aes(color = gender)) + * facet_wrap(race1 ~ gender, nrow = 2) ``` .border[ <img src="www/facet_wrap-rows-1-1.png" width="52%" height="42%" style="display: block; margin: auto;" /> ] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Recap #### 1) Introduction the grammar of graphics syntax #### 2) Identifying graph aesthetics (position, color, shape, opacity, etc.) #### 3) Recognizing and using `geoms` (`geom_point`, `geom_smooth`, etc.) #### 4) Facetting graphs (`facet_wrap` with 1 or two variables)