9 Summary bar graphs
9.1 Description
Bar graphs can summarize data by showing a statistic for each category. The x-axis lists the categories, and the y-axis shows the values. They can include error bars to show variation or confidence intervals.
In ggplot2
, we can create summary bar graphs with geom_bar()
.
9.2 Set up
PACKAGES:
Install packages.
show/hide
install.packages("palmerpenguins")
library(palmerpenguins)
library(ggplot2)
DATA:
Remove the missing values from body_mass_g
and island
in the palmerpenguins::penguins
data and convert body mass in grams to kilograms (body_mass_kg
).
We’ll also reduce the number of columns in the penguins
data for clarity.
show/hide
<- palmerpenguins::penguins |>
peng_sum_col ::select(body_mass_g, island) |>
dplyr::drop_na() |>
tidyr# divide the mass value by 1000
::mutate(
dplyrbody_mass_kg = body_mass_g / 1000
|>
) ::group_by(island) |>
dplyr::summarise(
dplyravg_bmi_kg = mean(body_mass_kg)
|>
) ::ungroup()
dplyr::glimpse(peng_sum_col)
dplyr#> Rows: 3
#> Columns: 2
#> $ island <fct> Biscoe, Dream, Torgersen
#> $ avg_bmi_kg <dbl> 4.716018, 3.712903, 3.706373
9.3 Grammar
CODE:
Create labels with
labs()
Initialize the graph with
ggplot()
and providedata
Map
island
tox
andavg_bmi_kg
toy
Inside the
aes()
ofgeom_col()
, mapisland
tofill
Outside the
aes()
ofgeom_col()
, remove the legend withshow.legend = FALSE
show/hide
<- labs(
labs_sum_col title = "Average Penguin Body Mass",
subtitle = "What is the average penguin BMI per Island?",
x = "Island",
y = "Average Penguin Body Mass (kg)")
<- ggplot(data = peng_sum_col,
ggp2_sum_col aes(x = island,
y = avg_bmi_kg)) +
geom_col(aes(fill = island),
show.legend = FALSE)
+
ggp2_sum_col labs_sum_col
GRAPH:
9.4 More Info
Below is more information on geom_bar()
vs. geom_col()
.
9.4.1 Identity vs. Count
- The
geom_bar()
geom will also create grouped bar graphs, but we have to switch thestat
argument to"identity"
.
show/hide
ggplot(data = peng_sum_col,
aes(x = island,
y = avg_bmi_kg)) +
geom_col(aes(fill = island),
show.legend = FALSE,
stat = "identity") +
labs_sum_col
9.4.2 geom_bar()
vs. geom_col()
geom_bar()
will map a categorical variable to thex
ory
and display counts for the discrete levels (seestat_count()
for more info)geom_col()
will map bothx
andy
aesthetics, and is used when we want to display numerical (quantitative) values across the levels of a categorical variable.geom_col()
assumes these values have been created in their own column (seestat_identity()
for more info)