Graph info

Should I use this graph?


This graph requires:

✅ multiple categorical variables

✅ a date variable

Description

An alluvial graph displays the changes in composition or flow over time or across multiple categories.

We can build alluvial charts in ggplot2 with the ggalluvial package:.

See also: parallel sets

Getting set up

PACKAGES:

Install packages.

Code
devtools::install_github("corybrunson/ggalluvial")
library(ggalluvial)
install.packages("palmerpenguins")
library(palmerpenguins) 
library(ggplot2)

DATA:

Artwork by @allison_horst

Below we create a wide example of the penguins data (as peng_wide)

Code
peng_wide <- penguins |> 
  tidyr::drop_na() |> 
  dplyr::count(year, island, sex, species) |> 
  dplyr::mutate(year = factor(year)) |> 
  dplyr::rename(freq = n)
glimpse(peng_wide)
Rows: 30
Columns: 5
$ year    <fct> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 20…
$ island  <fct> Biscoe, Biscoe, Biscoe, Biscoe, Dream, Dream, Dream, Dream, To…
$ sex     <fct> female, female, male, male, female, female, male, male, female…
$ species <fct> Adelie, Gentoo, Adelie, Gentoo, Adelie, Chinstrap, Adelie, Chi…
$ freq    <int> 5, 16, 5, 17, 9, 13, 10, 13, 8, 7, 9, 22, 9, 23, 8, 9, 8, 9, 8…

The grammar

CODE:

Create labels with labs() (with ggtitle(), ylab(), and labs())

Add scale_x_discrete() with the limits set to "Year", "Island" and "Species", and expand to 0.1 and 0.07

Add geom_alluvium() with fill set to the sex variable and geom_stratum()

Add geom_text(), with stat set to stratum and label set to after_stat(stratum) (inside aes())

Code
labs_alluvial <- ggtitle(label = "Palmer Penguins", 
    subtitle = "Stratified by year, island and species")
labs_alluvial_y <- ylab("Frequency") 
labs_alluvial_fill <- labs(fill = "Sex")

ggp2_alluvial_w <- ggplot(data = peng_wide,
  aes(axis1 = year, axis2 = island,
      axis3 = species, y = freq)) +
  scale_x_discrete(
    limits = c("Year", "Island", "Species"),
    expand = c(0.1, 0.07)) +
  geom_alluvium(aes(fill = sex)) +
  geom_stratum() +
  geom_text(stat = "stratum", 
    aes(label = after_stat(stratum)),
      size = 3)

ggp2_alluvial_w + 
  labs_alluvial + 
  labs_alluvial_y + 
  labs_alluvial_fill

GRAPH:

The ggalluvial functions can handle wide or long data.

More info

The ggalluvial package can also help reshape data with the to_lodes_form() function.

DATA:

Below we create peng_lodes from the penguins dataset using the to_lodes_form() from the ggalluvial package.

Code
peng_lodes <- penguins |> 
  dplyr::select(Year = year, Island = island, 
         Species = species, Sex = sex) |> 
  tidyr::drop_na() |> 
  dplyr::count(Year, Island, Species, Sex) |> 
  dplyr::mutate(Year = factor(Year)) |> 
  dplyr::rename(Freqency = n) |> 
  ggalluvial::to_lodes_form(key = "Measure", axes = 1:3) 
glimpse(peng_lodes)
Rows: 90
Columns: 5
$ Sex      <fct> female, male, female, male, female, male, female, male, femal…
$ Freqency <int> 5, 5, 16, 17, 9, 10, 13, 13, 8, 7, 9, 9, 22, 23, 8, 8, 9, 9, …
$ alluvium <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ Measure  <fct> Year, Year, Year, Year, Year, Year, Year, Year, Year, Year, Y…
$ stratum  <fct> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2…

CODE:

Create labels with labs()

Map Measure to x, Frequency to y, stratum to stratum, alluvium to alluvium, and label to stratum.

Add the geom_alluvium() and map Sex to fill

Add the geom_stratum() and set the width to 0.45

Add geom_text() and set stat to "stratum"