27 Raincloud plots
27.1 Description
Raincloud plots are a combination of density graph, a box plot, and a beeswarm (or jitter) plot, and are used to compare distributions of quantitative/numerical variables across the levels of a categorical (or discrete) grouping variable.
We can use the raincloudplots
package to create raincloud plots, or they can be built using the ggdist
package and geoms from ggplot2
.
27.2 Set up
PACKAGES:
Install packages.
show/hide
::install_github('jorvlan/raincloudplots')
remotes::install_github('mjskay/ggdist')
remoteslibrary(raincloudplots)
library(ggdist)
library(palmerpenguins)
library(ggplot2)
DATA:
Remove the missing values from year
and flipper_length_mm
the penguins
data. The raincloudplots
package has a data_1x1()
function we can use to build the dataset for a 1x1 repeated measure graph (peng_1x1
).
This function takes two array arguments (array_1
and array_2
), which we create with the flipper length (flipper_length_mm
) for two levels of year
in the peng_raincloud
data.
The jit_distance
and jit_seed
refer to the points in the plot.
show/hide
# remove missing
<- palmerpenguins::penguins |>
peng_raincloud filter(!is.na(year) & !is.na(body_mass_g))
# filter flipper length by years 2008 & 2009
<- raincloudplots::data_1x1(
peng_1x1 array_1 = dplyr::filter(peng_raincloud, year == 2008)$flipper_length_mm,
array_2 = dplyr::filter(peng_raincloud, year == 2009)$flipper_length_mm,
jit_distance = 0.2, # distance between points
jit_seed = 2736) # used in set.seed()
glimpse(peng_1x1)
#> Rows: 233
#> Columns: 4
#> $ y_axis <int> 186, 188, 190, 200, 187, 191, 186…
#> $ x_axis <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ id <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11…
#> $ jit <dbl> 1.0200375, 0.9498208, 0.8820888, …
27.3 Grammar (raincloudplots
)
CODE:
Create labels with labs()
Use the raincloudplots::raincloud_1x1()
to build the plot, assigning peng_1x1
to data_1x1
assign
colors
andfills
set the size (of the points) and alpha (for opacity)
show/hide
<- raincloudplots::raincloud_1x1(
ggp2_raincloud data_1x1 = peng_1x1,
colors = (c('#0bd3d3', '#282b2d')),
fills = (c('#0bd3d3', '#282b2d')),
size = 0.8,
alpha = 3/4,
ort = 'h')
<- ggp2_raincloud +
ggp2_raincloud_x ::scale_x_continuous(
ggplot2breaks = c(1, 2),
labels = c("2008", "2009"),
limits = c(0, 3))
+
ggp2_raincloud_x ::labs(
ggplot2title = "Flipper length of Palmer penguins",
subtitle = "Years 2008 & 2009",
x = "Year",
y = "Flipper length (mm)")
GRAPH:
27.4 More info
DATA:
We’ll use the peng_raincloud
data (with the missing values removed from species
and body_mass_g
).
show/hide
# remove missing
<- palmerpenguins::penguins |>
peng_raincloud filter(!is.na(species) &
!is.na(body_mass_g))
glimpse(peng_raincloud)
#> Rows: 342
#> Columns: 8
#> $ species <fct> Adelie, Adelie, Adelie…
#> $ island <fct> Torgersen, Torgersen, …
#> $ bill_length_mm <dbl> 39.1, 39.5, 40.3, 36.7…
#> $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, 19.3…
#> $ flipper_length_mm <int> 181, 186, 195, 193, 19…
#> $ body_mass_g <int> 3750, 3800, 3250, 3450…
#> $ sex <fct> male, female, female, …
#> $ year <int> 2007, 2007, 2007, 2007…
27.4.1 Box-plot
Create labels with
labs()
Initialize the graph with
ggplot()
and providedata
For the first layer, we create a box plot with
geom_boxplot()
, but include notches and remove the outliers.
show/hide
<- ggplot2::labs(
labs_raincloud_2 title = "Flipper length of Palmer penguins",
x = "Flipper length (mm)",
y = "Species")
<- ggplot(peng_raincloud,
ggp2_box aes( x = flipper_length_mm, y = species)) +
geom_boxplot(aes(fill = species),
notch = TRUE,
notchwidth = 0.9,
width = 0.15,
outlier.shape = NA,
show.legend = FALSE)
+
ggp2_box labs_raincloud_2
27.4.2 Density
We then add a horizontal density curve with ggdist::stat_halfeye()
, mapping species
to fill
, and adjusting the size and shape of the density curve and shifting it slightly above the box plot.
show/hide
<- ggp2_box +
ggp2_stat_halfeye ::stat_halfeye(aes(fill = species),
ggdistadjust = 0.6, # shape = adjust * density estimator
.width = 0, # can use probabilities or 0
point_colour = NA, # removes the point in center
orientation = "horizontal", # like the box plot
height = 0.5, # height of curve
justification = -0.3, # shift vertically above box
show.legend = FALSE # don't need this
) +
ggp2_stat_halfeye labs_raincloud_2
27.4.3 Points
The final layer is a geom_point()
, mapping fill
to species
and setting position
to position_jitter()
. Additional adjustments to the points include:
Using
shape = 21
, we cancolor
the outside of the point (white makes it appear to glow).Manually set the
height
, which refers to the vertical area for the points
show/hide
<- ggp2_stat_halfeye +
ggp2_jitter geom_point(aes(fill = species),
position = position_jitter(
seed = 321,
height = .07),
shape = 21,
color = "#ffffff",
alpha = 1/3,
size = 1.8,
show.legend = FALSE)
+
ggp2_jitter labs_raincloud_2
27.5 More examples & resources
27.5.1 Point shape
Cédric Scherer covered raincloud plots in this great write-up for #TidyTuesday.
Cédric also covers some alternative methods for plotting the points (I particularly like using bands instead of points when displaying the rainclouds vertically).
We can switch to this layout by applying ggplot2::coord_flip()
to the ggp2_stat_halfeye
layer, then adding geom_point()
with shape
set to 95
show/hide
+
ggp2_stat_halfeye ::coord_flip() +
ggplot2::geom_point(
ggplot2shape = 95,
size = 8,
alpha = 0.2) +
theme(legend.position = "none") +
labs_raincloud_2
27.5.2 #TidyTuesday example
The code to re-create the #TidyTuesday graph is contained in this gist.
27.5.3 More resources
MORE RESOURCES