8 Grouped bar graphs
8.1 Description
Grouped bar graphs compare multiple values across categories. Each category has a cluster of bars. X-axis labels show categories and y-axis values. Colors or patterns differentiate subsets, and a legend is shown on the side or top of the graph.
geom_col()
allows us to display ‘grouped’ numerical values across levels (or groups) of a categorical variable.
8.2 Set up
PACKAGES:
Install packages.
show/hide
install.packages("palmerpenguins")
library(palmerpenguins)
library(ggplot2)
DATA:
Remove the missing values and reduce the palmerpenguins::penguins
dataset to only body_mass_g
and island
, then group the data by island
and calculate the sum of body_mass_g
(as sum_body_mass_g
).
show/hide
<- palmerpenguins::penguins |>
peng_grp_col ::select(body_mass_g, island) |>
dplyr::drop_na() |>
tidyr::group_by(island) |>
dplyr::summarise(
dplyrsum_body_mass_g = sum(body_mass_g)
|>
) ::ungroup()
dplyrglimpse(peng_grp_col)
#> Rows: 3
#> Columns: 2
#> $ island <fct> Biscoe, Dream, Torgersen
#> $ sum_body_mass_g <int> 787575, 460400, 189025
8.3 Grammar
CODE:
Grouped bar graphs assume the statistical measure (i.e., the value that the length of the bars will be derived from) is contained in a variable and mapped to the x
or y
aesthetic.
Create labels with
labs()
Initialize the graph with
ggplot()
and providedata
Map
island
to thex
andsum_body_mass_g
to they
Map
island
tofill
inside theaes()
ofgeom_col()
show/hide
<- labs(
labs_grp_col title = "Total Penguin Mass",
subtitle = "What's the total mass of penguins per Island?",
x = "Island",
y = "Total penguin body mass (g)")
<- ggplot(data = peng_grp_col,
ggp2_grp_col aes(x = island,
y = sum_body_mass_g)) +
geom_col(aes(fill = island),
show.legend = FALSE)
+
ggp2_grp_col labs_grp_col
GRAPH:
8.4 More Info
We didn’t have to calculate sum_body_mass_g
(displayed on the y
axis) by island
because ggplot2
does this for us!
If we pass a categorical variable to the x
(like island
) and a continuous variable to y
(like body_mass_kg
), geom_col()
will calculate the sum()
of y
by levels of x
:
show/hide
|>
penguins ::select(body_mass_g, island) |>
dplyr::drop_na() |>
tidyrggplot(aes(x = island, y = body_mass_g)) +
geom_col(aes(fill = island),
show.legend = FALSE) +
labs_grp_col
We can see the underlying totaling of body_mass_g
using dplyr
’s group_by()
and summarise()
functions.
show/hide
::penguins |>
palmerpenguins::select(body_mass_g, island) |>
dplyr::drop_na() |>
tidyr::group_by(island) |>
dplyr::summarise(
dplyr`Total Penguin Body Mass (kg)` = sum(body_mass_g)
|>
) ::ungroup() |>
dplyr::select(`Island` = island,
dplyr`Total Penguin Body Mass (kg)`)
Island | Total Penguin Body Mass (kg) |
---|---|
Biscoe | 787575 |
Dream | 460400 |
Torgersen | 189025 |