39 Parallel sets
39.1 Description
Parallel sets (also referred to as Sankey diagrams or Alluvial charts) show the counts of categorical variables connected via a two-sided parallel display (or ‘sets’). Parallel sets can also be used to show different states of paired dependent relationships (such as input vs output), or time 1 vs time 2.
The height of the connecting bands between the categories on the x
axis represent the relative counts for each discrete level (displayed on the y
axis). The levels within each variable are represented with color.
We can build parallel set diagrams with the ggforce
package.
Also check out alluvial charts.
39.2 Set up
PACKAGES:
Install packages.
show/hide
# pak::pak("thomasp85/ggforce")
install.packages("palmerpenguins")
library(ggforce)
library(palmerpenguins)
library(ggplot2)
DATA:
We’re going to remove the missing values from palmerpenguins::penguins
, count the categorical variables (island
, sex
, species
), and rename the n
column (produced by the count()
function) to value
.
ggforce
has a specialgather_set_data()
function that changes tidy data into a tidy(er) format
show/hide
<- palmerpenguins::penguins |>
peng_wide drop_na() |>
count(island, species, sex) |>
rename(value = n)
<- ggforce::gather_set_data(
para_set_peng data = peng_wide,
x = 1:3)
::glimpse(para_set_peng)
dplyr#> Rows: 30
#> Columns: 7
#> $ island <fct> Biscoe, Biscoe, Biscoe, Biscoe, …
#> $ species <fct> Adelie, Adelie, Gentoo, Gentoo, …
#> $ sex <fct> female, male, female, male, fema…
#> $ value <int> 22, 22, 58, 61, 27, 28, 34, 34, …
#> $ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1…
#> $ x <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,…
#> $ y <fct> Biscoe, Biscoe, Biscoe, Biscoe, …
39.3 Grammar
CODE:
Create labels with
labs()
Initialize the graph with
ggplot()
and providedata
Map
x
tox
,id
toid
,y
tosplit
, andvalue
tovalue
In the
geom_parallel_sets()
function, mapsex
tofill
and manually set thealpha
(opacity) and theaxis.width
In the
geom_parallel_sets_axes()
function, set theaxis.width
to the same value as thegeom_parallel_sets()
aboveFor labeling, adjust the
size
manually and set thecolor
to something that stands out against the black vertical axesManually label the
x
axis withscale_x_continuous()
, setting thebreaks
andlabels
to the variable names in thepeng_wide
datasetFinally, remove the
x
title withaxis.title.x = element_blank()
show/hide
<- labs(
labs_psets title = "Categories of Palmer Penguins",
y = "Count", fill = "Sex")
<- ggplot(data = para_set_peng,
ggp2_psets mapping = aes(x = x,
id = id,
split = y,
value = value)) +
geom_parallel_sets(aes(fill = sex),
alpha = 0.3,
axis.width = 0.07)
<- ggp2_psets +
ggp2_psets_axes geom_parallel_sets_axes(
axis.width = 0.07)
<- ggp2_psets_axes +
ggp2_psets_labs geom_parallel_sets_labels(
size = 2.0,
color = '#ffffff') +
scale_x_continuous(
breaks = c(1, 2, 3),
labels = c("Island", "Species", "Sex")) +
theme(axis.title.x = element_blank())
+
ggp2_psets_labs labs_psets
GRAPH:
39.4 More info
If the categories have long names, you can move the location of the labels outside the set.
39.4.1 Labeling sets
If the categories have long names, use the angle
, nudge_x
/nudge_y
and hjust
/vjust
in geom_parallel_sets_labels()
to adjust the size, location, and color of the labels.
- Manually setting the
limits
of thex
axis inscale_x_continuous()
will also give more room for the labels.
show/hide
+
ggp2_psets_axes geom_parallel_sets_labels(
size = 3.2,
colour = '#000000',
angle = 0,
nudge_x = 0.1,
hjust = 0) +
scale_x_continuous(
limits = c(0.9, 3.2),
breaks = c(1, 2, 3),
labels = c("Island", "Species", "Sex")) +
theme(axis.title.x = element_blank()) +
labs_psets