class: center, middle, inverse, title-slide .title[ #
ggplot2
Graph Gallery ] .subtitle[ ## Categories and distributions:
amounts
] .author[ ### Martin Frigaard ] .date[ ### 2022-05-22 ] --- ### Load data packages <br> ```r library(palmerpenguins) library(fivethirtyeight) library(ggplot2movies) ``` --- class: left, top background-image: url(https://allisonhorst.github.io/palmerpenguins/reference/figures/logo.png) background-position: 95% 8% background-size: 7% ## `palmerpenguins` [package website](https://allisonhorst.github.io/palmerpenguins/) ```r penguins <- palmerpenguins::penguins penguins ``` .small[
] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% ## `fivethirtyeight` [package website](https://fivethirtyeight-r.netlify.app/) *All datasets are listed below with descriptions* .small[
] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% ## `ggplot2movies` [package website](https://github.com/hadley/ggplot2movies) *We're using `movies_data` (derived version of the `ggplot2movies::movies`)* ```r movies_data ``` .small[
] --- class: inverse, center, top background-image: url(images/ggplot2.png) background-position: 50% 50% background-size: 25% # Comparing Categories and Distributions <br><br><br><br><br><br><br><br><br><br><br> # Amounts --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: Bars <br> .large[ *The bar chart (or graph) is typically used to display counts. Bar charts can be arranged vertically or horizontally, stacked, diverging, or dodged. In `ggplot2`, bar charts can be built using `geom_bar()` or `geom_col()`* ] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: Bars <br> ```r movies_data ``` .small[
] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: Bars *Map `mpaa` to the `x` axis and to the `fill` aesthetic inside the `aes()` of `geom_bar()`, and add the labels* .panelset[ .panel[.panel-name[R Code] ```r labs_geom_bar <- labs( x = "MPAA rating", title = "IMDB movie information/user ratings") ggplot(data = movies_data, aes(x = mpaa)) + geom_bar(aes(fill = mpaa)) + labs_geom_bar ``` ] .panel[.panel-name[Plot] <img src="ggp2-amounts_files/figure-html/show-geom_bar-1.png" width="80%" height="80%" style="display: block; margin: auto;" /> ] ] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: Grouped Bars <br> .large[ *To create grouped bar charts (compare the values of a numerical variable across the levels of a categorical variable) we can use the [`geom_col()`](https://ggplot2.tidyverse.org/reference/geom_bar.html) function.* ] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: Grouped Bars *Map `mpaa` to the `x` axis, `rating` to the `y` axis, and `mpaa` to `fill` inside the `aes()` of `geom_col()`, and add the labels* .panelset[ .panel[.panel-name[R Code] ```r labs_geom_col <- labs( x = "MPAA rating", y = "Average IMDB user rating", title = "IMDB movie information/user ratings") ``` ```r ggplot(data = movies_data, aes(x = mpaa, y = rating)) + geom_col(aes(fill = mpaa)) + labs_geom_col ``` ] .panel[.panel-name[Plot] <img src="ggp2-amounts_files/figure-html/plot-geom_col-1.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] ] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: Stacked Bars <br> .large[ *We can also use bars to look at numeric and categorical variables using `geom_bar()` by setting `fill` argument.* ] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: Stacked Bars *Map `flipper_length_mm` to the `x` axis, `sex` to `fill`, the `geom_bar()` layer, and add the labels* .panelset[ .panel[.panel-name[R Code] ```r labs_geom_bar_stacked <- labs( x = "Flipper length (millimeters)", title = "Adult foraging penguins") ``` ```r # remove missing sex penguins_stacked <- filter(penguins, !is.na(sex)) ggplot(data = penguins_stacked, aes(x = flipper_length_mm, fill = sex)) + geom_bar() + labs_geom_bar_stacked ``` ] .panel[.panel-name[Plot] <img src="ggp2-amounts_files/figure-html/plot-stacked-bars-1.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] ] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: More Stacked Bars .large[ *Map `island` to the `x` axis, `flipper_length_mm` to the `y` axis, `sex` to `fill`, the `geom_bar()` layer (with `position` and `stat`), and add the labels* ] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: More Stacked Bars .panelset[ .panel[.panel-name[R Code] ```r geom_bar_stacked_2 <- labs( x = "Island in Palmer Archipelago", y = "Flipper length (millimeters)", title = "Adult foraging penguins") ggplot(data = penguins, aes(x = island, y = flipper_length_mm, fill = sex)) + # use this to determine how many # sex values are NA (and in what # categories) geom_bar(position = "stack", stat = "identity") + geom_bar_stacked_2 ``` ] .panel[.panel-name[Plot] <img src="ggp2-amounts_files/figure-html/unnamed-chunk-1-1.png" width="972.288" style="display: block; margin: auto;" /> ] ] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: Diverging Bars *If you have a numeric variable with positive and negative values, consider using diverging bars with `geom_bar()`* .panelset[ .panel[.panel-name[R Code] ```r unisex_names <- fivethirtyeight::unisex_names unisex_names_diff <- mutate(unisex_names, male_female_diff = male_share - female_share, diff_cat = if_else( male_female_diff > 0, true = "More common male name", false = "More common female name")) sample_names <- slice_sample(unisex_names_diff, n = 10) ``` ] .panel[.panel-name[Data] .small[
] ] ] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: Diverging Bars *Here we use the `reorder()` function to arrange the values of `male_female_diff` by `name`, and map the `diff_cat` to `label`.* .panelset[ .panel[.panel-name[R Code] .code70[ ```r labs_geom_bar_diverg <- labs( x = "Name", y = "Male share - female share", title = "Most Common Unisex Names In America", fill = "Difference category") ggplot(data = sample_names, aes(x = reorder(x = name, male_female_diff), # reorder this by x y = male_female_diff, label = diff_cat)) + geom_bar( aes(fill = diff_cat), stat = "identity", width = .5) + labs_geom_bar_diverg ``` ] ] .panel[.panel-name[Plot] <img src="ggp2-amounts_files/figure-html/unnamed-chunk-2-1.png" width="972.288" style="display: block; margin: auto;" /> ] ] --- class: left, top background-image: url(images/pdg-hex.png) background-position: 95% 8% background-size: 7% # Amounts: Diverging Bars (vertical) *Diverging bar-charts can be arranged vertically, too. For vertically arranged bars, we switch the `x` and `y` axis variables (and the `reorder()` function).* .panelset[ .panel[.panel-name[R Code] .code70[ ```r labs_geom_bar_diverg_vert <- labs( x = "Name", y = "Male share - female share", title = "Most Common Unisex Names In America", fill = "Difference category") ggplot(data = sample_names, aes(x = male_female_diff, # reorder this by x y = reorder(x = name, male_female_diff), label = diff_cat)) + geom_bar( aes(fill = diff_cat), stat = "identity", width = .5) + labs_geom_bar_diverg_vert ``` ] ] .panel[.panel-name[Plot] <img src="ggp2-amounts_files/figure-html/unnamed-chunk-3-1.png" width="972.288" style="display: block; margin: auto;" /> ] ]