Overlapping density plot

Graph info

Should I use this graph?

This graph requires:

✅ a categorical variable

✅ a numeric (continuous) variable

Description

Density plots are smoothed version(s) of histogram(s). They can are great for comparing the distributions of a continuous variable across the levels of a categorical variable.

geom_density() creates a kernel density estimate. The default position argument is "identity", which takes the data as is. However, we can change position to "stack" to display overlapping distributions.

PACKAGES:

Install packages.

Code

install.packages("palmerpenguins")
library(palmerpenguins)
library(ggplot2)

DATA:

Remove missing sex from the penguins data

Code

peng_density <- dplyr::filter(penguins, !is.na(sex))
dplyr::glimpse(peng_density)

Rows: 333
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 41.1, 38.6…
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, 19.3, 20.6, 17.8, 19.6, 17.6, 21.2…
$ flipper_length_mm <int> 181, 186, 195, 193, 190, 181, 195, 182, 191, 198, 18…
$ body_mass_g       <int> 3750, 3800, 3250, 3450, 3650, 3625, 4675, 3200, 3800…
$ sex               <fct> male, female, female, female, male, female, male, fe…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

The grammar

Code
Graph

CODE:

Create labels with labs()

Initialize the graph with ggplot() and provide data

Map the flipper_length_mm to the x and sex to fill

Add the geom_density()

Set the alpha to 1/3 (to handle the overlapping areas)

Code

labs_ovrlp_density <- labs(
  title = "Adult foraging penguins",
  x = "Flipper length (millimeters)",
  fill = "Sex")
ggp2_ovrlp_density <- ggplot(data = peng_density, 
       aes(x = flipper_length_mm, 
           fill = sex)) +
      geom_density(alpha = 1/3) 
ggp2_ovrlp_density + 
  labs_ovrlp_density

GRAPH:

A downside of density plots is the lack of interpretability of the y axis

Make density area slightly transparent to handle over-plotting

More info

ggplot2 has multiple options for overlapping density plots, so which one to use will depend on how you’d like to display the relative distributions in your data. We’ll cover three options below:

STACK:

If we change the position to "stack" we can see the smoothed estimates are ‘stacked’ on top each other (and the y axis shifts slightly).

Code

labs_stack_density <- labs(
    title = "Adult foraging penguins",
    x = "Flipper length (millimeters)",
    fill = "Sex")
ggp2_stack_density <- ggplot(data = peng_density,
    mapping = aes(x = flipper_length_mm,
               fill = sex)) +
    geom_density(position = "stack",
                 alpha = 1 / 3)
ggp2_stack_density +
    labs_stack_density

Setting position to 'stack' loses marginal densities

AFTER STAT:

If we include after_stat(count) as one of our mapped aesthetics, the mapping is postponed until after statistical transformation, and uses the density * n instead of the default density.

Code

labs_after_stat_density <- labs(
  title = "Adult foraging penguins",
  x = "Flipper length (millimeters)", 
  fill = "Sex")
ggp2_after_stat_density <- ggplot(data = peng_density, 
       aes(x = flipper_length_mm, 
           after_stat(count),
           fill = sex)) +
      geom_density(position = "stack", 
                   alpha = 1/3) 
ggp2_after_stat_density + 
  labs_after_stat_density

Adding after_stat(count) ‘preserves marginal densities.’, which result in more a interpretable y axis (depending on the audience)

FILL:

Using after_stat(count) with position = "fill" creates in a conditional density estimate.

Code

labs_fill_density <- labs(
  title = "Adult foraging penguins",
  x = "Flipper length (millimeters)", 
  fill = "Sex")
ggp2_fill_density <- ggplot(data = peng_density, 
       aes(x = flipper_length_mm, 
           after_stat(count),
           fill = sex)) +
      geom_density(position = "fill", 
                   alpha = 1/3) 
ggp2_fill_density + 
  labs_fill_density

This results in a y axis ranging from 0-1, and the area filled with the relative proportional values.

--- title: "Overlapping density plot" format: html: toc: true toc-location: right toc-title: Contents code-fold: true out-height: '100%' out-width: '100%' execute: warning: false message: false --- ```{r} #| label: setup #| message: false #| warning: false #| include: false library(tidyverse) library(lubridate) library(scales) library(knitr) library(kableExtra) library(colorblindr) library(downlit) # options ---- options( repos = "https://cloud.r-project.org", dplyr.print_min = 6, dplyr.print_max = 6, scipen = 9999) # fonts ---- library(extrafont) library(sysfonts) # import font extrafont::font_import( paths = "assets/Ubuntu/", prompt = FALSE) # add font sysfonts::font_add( family = "Ubuntu", regular = "assets/Ubuntu/Ubuntu-Regular.ttf") # use font showtext::showtext_auto() # add theme source("R/theme_ggp2g.R") # set theme ggplot2::theme_set(theme_ggp2g(base_size = 13)) install.packages("palmerpenguins") library(palmerpenguins) ``` :::: {.callout-note collapse="false" icon=false} ## Graph info ::: {style="font-size: 1.25em; color: #02577A;"} **Should I use this graph?** ::: <br> ```{r} #| label: full_code_display #| eval: true #| echo: false #| warning: false #| message: false #| out-height: '60%' #| out-width: '60%' #| fig-align: right library(palmerpenguins) library(ggplot2) library(patchwork) peng_density <- filter(penguins, !is.na(sex)) labs_ovrlp_density <- labs( subtitle = "position = 'identity'", x = "Flipper length (mm)", fill = "Sex") ggp2_ovrlp_density <- ggplot(data = peng_density, aes(x = flipper_length_mm, fill = sex)) + geom_density(alpha = 1/3) ovrlp_density <- ggp2_ovrlp_density + labs_ovrlp_density + ggplot2::theme_minimal(base_size = 11) labs_stack_density <- labs( subtitle = "position = 'stack'", x = "Flipper length (mm)", fill = "Sex") ggp2_stack_density <- ggplot(data = peng_density, aes(x = flipper_length_mm, after_stat(count), fill = sex)) + geom_density(position = "stack", alpha = 1/3) stack_density <- ggp2_stack_density + labs_stack_density + ggplot2::theme_minimal(base_size = 11) ovrlp_density + stack_density + patchwork::plot_layout(ncol = 1) ``` ::: {style="font-size: 1.10em; color: #02577A;"} **This graph requires:** ::: ::: {style="font-size: 0.90em; color: #043b67;"} `r emo::ji("check")` a categorical variable ::: ::: {style="font-size: 0.90em; color: #043b67;"} `r emo::ji("check")` a numeric (continuous) variable ::: :::: ## Description Density plots are [*smoothed version(s) of histogram(s)*](https://ggplot2.tidyverse.org/reference/geom_density.html). They can are great for comparing the distributions of a continuous variable across the levels of a categorical variable. `geom_density()` creates a kernel density estimate. The default `position` argument is `"identity"`, which takes the data as is. However, we can change `position` to `"stack"` to display overlapping distributions. ## Getting set up :::: {.panel-tabset} ### Packages ::: {style="font-size: 1.15em; color: #1e83c8;"} **PACKAGES:** ::: ::: {style="font-size: 0.85em;"} Install packages. ::: ::: {style="font-size: 0.75em;"} ```{r} #| label: pkg_code_ovrlp_density #| code-fold: show #| eval: true #| echo: true #| warning: false #| message: false #| results: hide install.packages("palmerpenguins") library(palmerpenguins) library(ggplot2) ``` ::: ### Data ::: {style="font-size: 1.15em; color: #1e83c8;"} **DATA:** ::: ::: {.column-margin} ![Artwork by @allison_horst](../www/lter_penguins.png){fig-align="right" width="100%" height="100%"} ::: ::: {style="font-size: 0.85em;"} Remove missing `sex` from the `penguins` data ::: ::: {style="font-size: 0.75em;"} ```{r} #| label: data_code_ovrlp_density #| eval: true #| echo: true peng_density <- dplyr::filter(penguins, !is.na(sex)) dplyr::glimpse(peng_density) ``` ::: :::: ## The grammar :::: {.panel-tabset} ### Code ::: {style="font-size: 1.15em; color: #1e83c8;"} **CODE:** ::: ::: {style="font-size: 0.85em;"} Create labels with `labs()` Initialize the graph with `ggplot()` and provide `data` Map the `flipper_length_mm` to the `x` and `sex` to `fill` Add the `geom_density()` Set the `alpha` to `1/3` (to handle the overlapping areas) ::: ::: {style="font-size: 0.75em;"} ```{r} #| label: code_graph_ovrlp_density #| code-fold: show #| eval: false #| echo: true #| warning: false #| message: false #| out-height: '100%' #| out-width: '100%' #| column: page-inset-right #| layout-nrow: 1 labs_ovrlp_density <- labs( title = "Adult foraging penguins", x = "Flipper length (millimeters)", fill = "Sex") ggp2_ovrlp_density <- ggplot(data = peng_density, aes(x = flipper_length_mm, fill = sex)) + geom_density(alpha = 1/3) ggp2_ovrlp_density + labs_ovrlp_density ``` ::: ### Graph ::: {style="font-size: 1.15em; color: #1e83c8;"} **GRAPH:** ::: ::: {.column-margin} *A downside of density plots is the lack of interpretability of the `y` axis* ::: ::: {style="font-size: 0.85em;"} Make density area slightly transparent to handle over-plotting ::: ```{r} #| label: create_graph_ovrlp_density #| eval: true #| echo: false #| warning: false #| message: false #| out-height: '100%' #| out-width: '100%' #| column: page-inset-right #| layout-nrow: 1 labs_ovrlp_density <- labs( title = "Adult foraging penguins", x = "Flipper length (millimeters)", fill = "Sex") ggp2_ovrlp_density <- ggplot(data = peng_density, aes(x = flipper_length_mm, fill = sex)) + geom_density(alpha = 1/3) ggp2_ovrlp_density + labs_ovrlp_density ``` :::: ## More info `ggplot2` has multiple options for overlapping density plots, so which one to use will depend on how you'd like to display the relative distributions in your data. We'll cover three options below: :::: {.panel-tabset} ### `"stack"` ::: {style="font-size: 1.15em; color: #1e83c8;"} **STACK:** ::: ::: {style="font-size: 0.85em;"} If we change the `position` to `"stack"` we can see the smoothed estimates are 'stacked' on top each other (and the `y` axis shifts slightly). ::: ::: {style="font-size: 0.75em;"} ```{r} #| label: code_graph_stack_density #| eval: true #| echo: true #| warning: false #| column: page-inset-right #| layout-nrow: 1 #| out-height: '100%' #| out-width: '100%' labs_stack_density <- labs( title = "Adult foraging penguins", x = "Flipper length (millimeters)", fill = "Sex") ggp2_stack_density <- ggplot(data = peng_density, mapping = aes(x = flipper_length_mm, fill = sex)) + geom_density(position = "stack", alpha = 1 / 3) ggp2_stack_density + labs_stack_density ``` ::: ::: {style="font-size: 0.85em;"} Setting `position` to `'stack'` [loses marginal densities](https://ggplot2.tidyverse.org/reference/geom_density.html#ref-examples) ::: ### `after_stat(count)` ::: {style="font-size: 1.15em; color: #1e83c8;"} **AFTER STAT:** ::: ::: {style="font-size: 0.85em;"} If we include `after_stat(count)` as one of our mapped aesthetics, the mapping is postponed until after statistical transformation, and uses the [*`density * n` instead of the default density*](https://ggplot2.tidyverse.org/reference/geom_density.html#computed-variables). ::: ::: {style="font-size: 0.75em;"} ```{r} #| label: code_graph_after_stat_density #| eval: true #| echo: true #| warning: false #| message: false #| out-height: '100%' #| out-width: '100%' #| column: page-inset-right #| layout-nrow: 1 labs_after_stat_density <- labs( title = "Adult foraging penguins", x = "Flipper length (millimeters)", fill = "Sex") ggp2_after_stat_density <- ggplot(data = peng_density, aes(x = flipper_length_mm, after_stat(count), fill = sex)) + geom_density(position = "stack", alpha = 1/3) ggp2_after_stat_density + labs_after_stat_density ``` ::: ::: {style="font-size: 0.85em;"} Adding [`after_stat(count)`](https://ggplot2.tidyverse.org/reference/geom_density.html#computed-variables) *'preserves marginal densities.'*, which result in more a interpretable `y` axis (depending on the audience) ::: ### `"fill"` ::: {style="font-size: 1.15em; color: #1e83c8;"} **FILL:** ::: ::: {style="font-size: 0.85em;"} Using `after_stat(count)` with `position = "fill"` creates in a conditional density estimate. ::: ::: {style="font-size: 0.75em;"} ```{r} #| label: code_graph_fill_density #| eval: true #| echo: true #| warning: false #| message: false #| out-height: '100%' #| out-width: '100%' #| column: page-inset-right #| layout-nrow: 1 labs_fill_density <- labs( title = "Adult foraging penguins", x = "Flipper length (millimeters)", fill = "Sex") ggp2_fill_density <- ggplot(data = peng_density, aes(x = flipper_length_mm, after_stat(count), fill = sex)) + geom_density(position = "fill", alpha = 1/3) ggp2_fill_density + labs_fill_density ``` ::: ::: {style="font-size: 0.85em;"} This results in a `y` axis ranging from 0-1, and the area filled with the relative proportional values. ::: ::::