27  Raincloud plots

This graph is largely complete and just needs final proof reading.


This graph requires:

✅ a categorical variable

✅ a numeric (continuous) variable

27.1 Description

Raincloud plots are a combination of density graph, a box plot, and a beeswarm (or jitter) plot, and are used to compare distributions of quantitative/numerical variables across the levels of a categorical (or discrete) grouping variable.

We can use the raincloudplots package to create raincloud plots, or they can be built using the ggdist package and geoms from ggplot2.

27.2 Set up

PACKAGES:

Install packages.

show/hide
remotes::install_github('jorvlan/raincloudplots')
remotes::install_github('mjskay/ggdist')
library(raincloudplots)
library(ggdist)
library(palmerpenguins)
library(ggplot2)

DATA:

Artwork by allison horst

Remove the missing values from year and flipper_length_mm the penguins data. The raincloudplots package has a data_1x1() function we can use to build the dataset for a 1x1 repeated measure graph (peng_1x1).

This function takes two array arguments (array_1 and array_2), which we create with the flipper length (flipper_length_mm) for two levels of year in the peng_raincloud data.

The jit_distance and jit_seed refer to the points in the plot.

show/hide
# remove missing
peng_raincloud <- palmerpenguins::penguins |> 
    filter(!is.na(year) & !is.na(body_mass_g))
# filter flipper length by years 2008 & 2009
peng_1x1 <- raincloudplots::data_1x1(
  array_1 = dplyr::filter(peng_raincloud, year == 2008)$flipper_length_mm,
  array_2 = dplyr::filter(peng_raincloud, year == 2009)$flipper_length_mm,
  jit_distance = 0.2, # distance between points
  jit_seed = 2736) # used in set.seed() 
glimpse(peng_1x1)
#> Rows: 233
#> Columns: 4
#> $ y_axis <int> 186, 188, 190, 200, 187, 191, 186…
#> $ x_axis <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ id     <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11…
#> $ jit    <dbl> 1.0200375, 0.9498208, 0.8820888, …

27.3 Grammar (raincloudplots)

CODE:

Create labels with labs()

Use the raincloudplots::raincloud_1x1() to build the plot, assigning peng_1x1 to data_1x1

  • assign colors and fills

  • set the size (of the points) and alpha (for opacity)

show/hide
ggp2_raincloud <- raincloudplots::raincloud_1x1(
    data_1x1 = peng_1x1,
    colors = (c('#0bd3d3', '#282b2d')),
    fills = (c('#0bd3d3', '#282b2d')),
    size = 0.8,
    alpha = 3/4,
    ort = 'h') 

ggp2_raincloud_x <- ggp2_raincloud +
    ggplot2::scale_x_continuous(
        breaks = c(1, 2),
        labels = c("2008", "2009"),
        limits = c(0, 3))

ggp2_raincloud_x + 
    ggplot2::labs(
        title = "Flipper length of Palmer penguins",
        subtitle = "Years 2008 & 2009", 
        x = "Year", 
        y = "Flipper length (mm)")

GRAPH:

27.4 More info

DATA:

Artwork by allison_horst

We’ll use the peng_raincloud data (with the missing values removed from species and body_mass_g).

show/hide
# remove missing
peng_raincloud <- palmerpenguins::penguins |> 
                    filter(!is.na(species) & 
                            !is.na(body_mass_g))
glimpse(peng_raincloud)
#> Rows: 342
#> Columns: 8
#> $ species           <fct> Adelie, Adelie, Adelie…
#> $ island            <fct> Torgersen, Torgersen, …
#> $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, 36.7…
#> $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, 19.3…
#> $ flipper_length_mm <int> 181, 186, 195, 193, 19…
#> $ body_mass_g       <int> 3750, 3800, 3250, 3450…
#> $ sex               <fct> male, female, female, …
#> $ year              <int> 2007, 2007, 2007, 2007…

27.4.1 Box-plot

  • Create labels with labs()

  • Initialize the graph with ggplot() and provide data

  • For the first layer, we create a box plot with geom_boxplot(), but include notches and remove the outliers.

show/hide
labs_raincloud_2 <- ggplot2::labs(
    title = "Flipper length of Palmer penguins",
         x = "Flipper length (mm)",
         y = "Species")
ggp2_box <- ggplot(peng_raincloud,
    aes( x = flipper_length_mm, y = species)) + 
  geom_boxplot(aes(fill = species),
    notch = TRUE, 
    notchwidth = 0.9,
    width = 0.15, 
    outlier.shape = NA,
    show.legend = FALSE)
ggp2_box + 
    labs_raincloud_2

27.4.2 Density

We then add a horizontal density curve with ggdist::stat_halfeye(), mapping species to fill, and adjusting the size and shape of the density curve and shifting it slightly above the box plot.

show/hide
ggp2_stat_halfeye <- ggp2_box +
  ggdist::stat_halfeye(aes(fill = species),
    adjust = 0.6, # shape = adjust * density estimator
    .width = 0, # can use probabilities or 0
    point_colour = NA, # removes the point in center
    orientation = "horizontal", # like the box plot
    height = 0.5, # height of curve
    justification = -0.3, # shift vertically above box
    show.legend = FALSE # don't need this 
) 
ggp2_stat_halfeye + 
    labs_raincloud_2

27.4.3 Points

The final layer is a geom_point(), mapping fill to species and setting position to position_jitter(). Additional adjustments to the points include:

  • Using shape = 21, we can color the outside of the point (white makes it appear to glow).

  • Manually set the height, which refers to the vertical area for the points

show/hide
ggp2_jitter <- ggp2_stat_halfeye + 
    geom_point(aes(fill = species),
        position = position_jitter(
          seed = 321, 
          height = .07),
            shape = 21, 
            color = "#ffffff", 
            alpha = 1/3, 
            size = 1.8,
            show.legend = FALSE) 
ggp2_jitter + 
    labs_raincloud_2

27.5 More examples & resources

27.5.1 Point shape

Cédric Scherer covered raincloud plots in this great write-up for #TidyTuesday.

Cédric also covers some alternative methods for plotting the points (I particularly like using bands instead of points when displaying the rainclouds vertically).

We can switch to this layout by applying ggplot2::coord_flip() to the ggp2_stat_halfeye layer, then adding geom_point() with shape set to 95

show/hide
ggp2_stat_halfeye + 
    ggplot2::coord_flip() + 
ggplot2::geom_point(
    shape = 95,
    size = 8,
    alpha = 0.2) + 
    theme(legend.position = "none") + 
    labs_raincloud_2

27.5.2 #TidyTuesday example

The code to re-create the #TidyTuesday graph is contained in this gist.

27.5.3 More resources

MORE RESOURCES

  1. Raincloud plots: a multi-platform tool for robust data visualization. 1

  2. Shape and point sizes in R. 2

  3. RainCloudPlots package on GitHub. 3