1  Bar graphs

This graph requires:

✅ a categorical variable


1.1 Description

A bar graph compares data in different categories using rectangular bars that vary in length or height. They can be vertical or horizontal, with the vertical axis showing the quantities being measured and the horizontal axis listing the categories. Bar graphs often include a legend explaining the colors or patterns used when comparing multiple variables.

In ggplot2, bar graphs can be built using geom_bar() (see also: geom_col()).

1.2 Set up

PACKAGES:

Install packages.

show/hide
install.packages("palmerpenguins")
library(palmerpenguins) 
library(ggplot2)

DATA:

Artwork by Allison Horst

Filter the missing values from species in the palmerpenguins::penguins data and store it in penguins_bar.

show/hide
penguins_bar <- palmerpenguins::penguins |> 
    dplyr::filter(!is.na(species))
glimpse(penguins_bar)
#> Rows: 344
#> Columns: 8
#> $ species           <fct> Adelie, Adelie, Adelie…
#> $ island            <fct> Torgersen, Torgersen, …
#> $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, …
#> $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, …
#> $ flipper_length_mm <int> 181, 186, 195, NA, 193…
#> $ body_mass_g       <int> 3750, 3800, 3250, NA, …
#> $ sex               <fct> male, female, female, …
#> $ year              <int> 2007, 2007, 2007, 2007…

1.3 Grammar

CODE:

Create labels with labs()

Initialize the graph with ggplot() and provide data

Map species to the x axis

Map species to the fill aesthetic inside the aes() of geom_bar()

Remove the legend with show.legend = FALSE

show/hide
labs_bar <- labs(
  title = "Adult foraging penguins",
  subtitle = "Distribution of flipper length",
  x = "Species", y = "Count", 
  fill = "Species")
ggp2_bar <- ggplot(data = penguins_bar,
       aes(x = species)) +
    geom_bar(aes(fill = species), 
        show.legend = FALSE)
ggp2_bar +
  labs_bar

GRAPH:

1.4 More info

  • The connection between statistical transformations and geoms is an important principle for building graphs (and mastering the grammar) with ggplot2
    • Below we cover why geom_bar(stat = "count") produces the same result as stat_count(geom = "bar")

every geom has a default stat, and every stat a default geom.” - ggplot2 book

  • Bar graphs can also be created with geom_col()

1.4.1 stats and geoms

stat_count():

The default stat argument in geom_bar() is set to "count", which ‘counts the number of cases at each x position’, so it’s ideal for categorical variables (or factors).

The stat_count() function can also be used to create bar graphs using the geom argument.

The link between geom_geom_name(stat = "stat_name") and stat_stat_name(geom = "geom_name") is shown below:

show/hide
ggp2_geom_bar <- ggplot(data = penguins_bar,
       aes(x = species)) +
    geom_bar(aes(fill = species), 
        stat = "count") + 
    labs(title = "geom_bar(stat = 'count')")
ggp2_geom_bar


ggp2_stat_count <- ggplot(data = penguins_bar,
       aes(x = species)) +
    stat_count(aes(fill = species), 
        geom = "bar") + 
    labs(title = "stat_count(geom = 'bar')")
ggp2_stat_count

1.4.2 geom_col()

geom_col():

To create a bar graph with geom_col(), the count variable needs to be computed before being mapped into the graph y aesthetic.

show/hide
penguins_bar |> 
    # create column of counts 
    dplyr::count(species, name = "count") |> 
    # map into x and y
    ggplot(mapping = aes(x = species, y = count)) +
    geom_col(aes(fill = species), 
        show.legend = FALSE) + 
    labs_bar +
    labs(caption = "*geom_col()")
    

# compare to geom_bar()
ggp2_bar +
  labs_bar