2D histograms
Description
Standard histograms separate a variable’s values into discrete groups, or ‘bins,’ which are arranged in increasing order across the x
axis. The y
axis displays the frequency (or count) of values within each bin.
Vertical bars capture the variable’s distribution using the height of the bar to represent the number of values per ‘bin’, and the number of bars corresponds with the bin value (or ‘bin-width’).
When we extend this display to two numerical/quantitative variables, the bins are used to divide the total graph area into a grid, and color is used to display the variation in frequency (or count) of both variable values that fall within each intersecting square.
Getting set up
PACKAGES:
Install packages.
Code
install.packages("palmerpenguins")
library(palmerpenguins)
library(ggplot2)
DATA:
We’ll take the flipper_length_mm
, bill_length_mm
, and species
variables from palmerpenguins::penguins
and drop the missing values.
Code
<- palmerpenguins::penguins |>
penguins_2dhist ::select(flipper_length_mm, bill_length_mm,
dplyr|>
sex, species, body_mass_g) ::drop_na()
tidyrglimpse(penguins_2dhist)
Rows: 333
Columns: 5
$ flipper_length_mm <int> 181, 186, 195, 193, 190, 181, 195, 182, 191, 198, 18…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 41.1, 38.6…
$ sex <fct> male, female, female, female, male, female, male, fe…
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ body_mass_g <int> 3750, 3800, 3250, 3450, 3650, 3625, 4675, 3200, 3800…
The grammar
CODE:
Create labels with labs()
Initialize the graph with ggplot()
and provide data
Map bill_length_mm
to the x
and flipper_length_mm
to the y
Add the geom_bin2d()
layer
Code
<- labs(
labs_2dhist title = "Adult Foraging Penguins",
subtitle = "Near Palmer Station, Antarctica",
x = "Bill length (mm)",
y = "Flipper length (mm)")
<- ggplot(data = penguins_2dhist,
ggp2_2dhist mapping = aes(x = bill_length_mm,
y = flipper_length_mm)) +
geom_bin2d()
+
ggp2_2dhist labs_2dhist
GRAPH:
More info
BINS:
The value for bins
will be vary depending on the variable values–there is no correct number. If the number of bins
is too low, the density may hide important nuances between the variables. If the number of bins
is too high, the noise might drown out the signal.
Below we change the bins
to 15
and save this layer as ggp2_2dbins15
:
Code
<- ggplot(data = penguins_2dhist,
ggp2_base mapping = aes(x = bill_length_mm,
y = flipper_length_mm))
<- ggp2_base +
ggp2_2dbins15 geom_bin2d(bins = 15)
+
ggp2_2dbins15 labs_2dhist
SCALES:
scale_fill_continuous_sequential()
comes with a variety of palettes to choose from (run hcl_palettes(type = "sequential")
to view the full list).
We can also reverse the order of the fill color scale with rev
(TRUE
or FALSE
).
Code
+
ggp2_2dbins15 scale_fill_continuous_sequential(
palette = "Mako",
rev = TRUE) +
labs_2dhist +
ggp2_2dbins15 scale_fill_continuous_sequential(
palette = "Mako",
rev = FALSE) +
labs_2dhist
OPTIONS:
If you set the point shape
to 21
, you have control over both color
and fill
.
In the previous example we showed how to rev
erse the color scale for the palette
in scale_fill_continuous_sequential()
. Below we reverse the color scale, but also manually set which colors on the scale we want to begin
with (i.e., smallest data value) and which color we want to end
with (i.e., the largest data value). Possible values range from 0
- 1
.
We also add a geom_point()
layer.
Code
+
ggp2_2dbins15 scale_fill_continuous_sequential(
palette = "SunsetDark",
rev = TRUE,
begin = 0.7, end = 0.2) +
geom_point(color = "#007bff",
fill = "#FFFFFF", shape = 21,
size = 2.2, alpha = 0.75) +
labs_2dhist