ggplot2 graph for SEG Shiny project
This project is maintained by mjfrigaard
Martin Frigaard
This document outlines a graph I built for a Shiny application.
The interface accepts a single .csv file with two columns containing blood glucose monitor measurements. These data get assigned a risk factor value, and the Shiny app plots the values to visually display the level of clinical risk in potentially inaccurate blood glucose monitors. This graph is referred to as the Surveillance Error Grid (SEG). The app also creates summary tables and a modified Bland-Altman plot.
In the first versions of the application, the SEG graph didn’t look very clean because of the reference values used to create the underlying plot. Below I describe the steps I took to figure out what was going on, and the packages/methods I used to fix it.
TOPICS COVERED:
ggplot2 for graphing
png for importing images
grid::rasterGrob() for creating a background in a ggplot2 graph
lots of dplyr and tidyr for wrangling
These are the packages you’ll need to reproduce this page:
library(tidyverse)
library(magrittr)
library(shiny)
library(rsconnect)
library(gplots)
library(tools)
library(jpeg)
library(png)
In order to create the SEG graphs and tables, I need to load a few data inputs from Github. The code chunk below loads:
RiskPairData -> this assign a risk factor value to each BGM
measurementSampMeasData -> this is a small sample measurement datasetVanBltComp -> this is a dataset from Vanderbilt used to create
some of the initial calculations# HEAT MAP DATA INPUTS ============= ----
# download AppRiskPairData.csv from github ---- ---- ---- ----
github_root <- "https://raw.githubusercontent.com/mjfrigaard/seg-shiny-data/master/"
app_riskpair_repo <- "Data/AppRiskPairData.csv"
# download to data repo
if (!file.exists("data/")) {
dir.create("data/")
}
# download Riskpairdata --------
utils::download.file(url = paste0(github_root, app_riskpair_repo),
destfile = "data/Riskpairdata.csv")
# download Sampledata data --------
samp_meas_data_rep <- "Data/FullSampleData.csv"
utils::download.file(url = paste0(github_root, samp_meas_data_rep),
destfile = "data/Sampledata.csv")
# download VanderbiltComplete data --------
vand_comp_data_rep <- "Data/VanderbiltComplete.csv"
utils::download.file(url = paste0(github_root, vand_comp_data_rep),
destfile = "data/VanderbiltComplete.csv")
# Read in the RiskPairData & SampMeasData
RiskPairData <- readr::read_csv(file = "data/Riskpairdata.csv")
## Parsed with column specification:
## cols(
## RiskPairID = col_double(),
## REF = col_double(),
## BGM = col_double(),
## RiskFactor = col_double(),
## abs_risk = col_double()
## )
SampMeasData <- readr::read_csv(file = "data/Sampledata.csv")
## Parsed with column specification:
## cols(
## BGM = col_double(),
## REF = col_double()
## )
VanBltComp <- readr::read_csv(file = "data/VanderbiltComplete.csv")
## Parsed with column specification:
## cols(
## BGM = col_double(),
## REF = col_double()
## )
# mmol conversion factor ---- -----
mmolConvFactor <- 18.01806
# rgb2hex function ---- -----
# This is the RGB to Hex number function for R
rgb2hex <- function(r, g, b) rgb(r, g, b, maxColorValue = 255)
# risk factor colors ---- -----
# These are the values for the colors in the heatmap.
abs_risk_0.0000_color <- rgb2hex(0, 165, 0)
# abs_risk_0.0000_color
abs_risk_0.4375_color <- rgb2hex(0, 255, 0)
# abs_risk_0.4375_color
abs_risk_1.0625_color <- rgb2hex(255, 255, 0)
# abs_risk_1.0625_color
abs_risk_2.7500_color <- rgb2hex(255, 0, 0)
# abs_risk_2.7500_color
abs_risk_4.0000_color <- rgb2hex(128, 0, 0)
# abs_risk_4.0000_color
riskfactor_colors <- c(
abs_risk_0.0000_color,
abs_risk_0.4375_color,
abs_risk_1.0625_color,
abs_risk_2.7500_color,
abs_risk_4.0000_color
)
# create base_data data frame ---- -----
base_data <- data.frame(
x_coordinate = 0,
y_coordinate = 0,
color_gradient = c(0:4)
)
RiskPairDataThis data has columns and risk pairs for both REF and BGM, and the
RiskFactor variable for each pair of REF and BGM data. Below you
can see a sample of the REF, BGM, RiskFactor, and abs_risk
variables.
RiskPairData %>%
dplyr::sample_n(size = 10) %>%
dplyr::select(REF, BGM, RiskFactor, abs_risk)
## # A tibble: 10 x 4
## REF BGM RiskFactor abs_risk
## <dbl> <dbl> <dbl> <dbl>
## 1 516 588 -0.00254 0.00254
## 2 145 584 -2.41 2.41
## 3 484 481 0 0
## 4 478 340 0.514 0.514
## 5 515 528 0 0
## 6 3 178 -3.00 3.00
## 7 476 116 1.98 1.98
## 8 567 231 1.08 1.08
## 9 431 236 0.931 0.931
## 10 557 318 0.618 0.618
SampMeasDataThis data set mimics a blood glucose monitor, with only BGM and REF
values.
SampMeasData %>%
dplyr::sample_n(size = 10) %>%
dplyr::select(REF, BGM)
## # A tibble: 10 x 2
## REF BGM
## <dbl> <dbl>
## 1 120 134
## 2 149 142
## 3 131 132
## 4 86 90
## 5 162 152
## 6 151 144
## 7 96 106
## 8 119 125
## 9 111 106
## 10 376 335
VanBltCompThis larger data set contains blood glucose monitor measurements, with
only BGM and REF values.
VanBltComp %>%
dplyr::sample_n(size = 10) %>%
dplyr::select(REF, BGM)
## # A tibble: 10 x 2
## REF BGM
## <dbl> <dbl>
## 1 164 152
## 2 118 125
## 3 122 107
## 4 135 157
## 5 163 153
## 6 162 148
## 7 297 283
## 8 78 89
## 9 199 187
## 10 71 68
In earlier versions of the application, the heatmap background wasn’t smoothed like the original Excel application. This file documents how I changed the SEG graph using a pre-made .png image.
This image is from the Excel application.

The points are plotted against a Gaussian smoothed background image.
The steps/code to create the current ggplot2 image are below (this
code will take a bit to run).
# 1 - base layer ---- ---- ---- ---- ---- ---- ---- ----
base_layer <- ggplot() +
geom_point(
data = base_data, # defines data frame
aes(
x = x_coordinate,
y = y_coordinate,
fill = color_gradient
)
) # + # uses x, y, color_gradient
# 2 - risk pair data layer ---- ---- ---- ---- ---- ---- ---- ----
risk_layer <- base_layer +
geom_point(
data = RiskPairData, # new data set
aes(
x = REF, # additional aesthetics from new data set
y = BGM,
color = abs_risk
),
show.legend = FALSE
) +
ggplot2::scale_color_gradientn(
colors = riskfactor_colors, # these are defined above with rgb2hex function
guide = "none",
limits = c(0, 4),
values = scales::rescale(c(
0, # darkgreen
0.4375, # green
1.0625, # yellow
2.7500, # red
4.0000
))
)
# 3 - add color gradient ---- ---- ---- ---- ---- ---- ---- ---- ----
risk_level_color_gradient <- risk_layer +
ggplot2::scale_fill_gradientn( # scale_*_gradientn creats a n-color gradient
values = scales::rescale(c(
0, # darkgreen
0.4375, # green
1.0625, # yellow
2.75, # red
4.0 # brown
)),
limits = c(0, 4),
colors = riskfactor_colors,
guide = guide_colorbar(
ticks = FALSE,
barheight = unit(100, "mm")
),
breaks = c(
0.25,
1,
2,
3,
3.75
),
labels = c(
"none",
"slight",
"moderate",
"high",
"extreme"
),
name = "risk level"
)
# 4 - add x and y axis ---- ---- ---- ---- ---- ---- ---- ---- ----
# Add the new color scales to the scale_y_continuous()
heatmap_plot <- risk_level_color_gradient +
ggplot2::scale_y_continuous(
limits = c(0, 600),
sec.axis =
sec_axis(~. / mmolConvFactor,
name = "measured blood glucose (mmol/L)"
),
name = "measured blood glucose (mg/dL)"
) +
scale_x_continuous(
limits = c(0, 600),
sec.axis =
sec_axis(~. / mmolConvFactor,
name = "reference blood glucose (mmol/L)"
),
name = "reference blood glucose (mg/dL)"
)
heatmap_plot

When we re-create the graph using the risk pair data, we get see sharp
edges for values over 400 mg/dL and ~ 22 mmol/L. This is because of the
relationships between the RiskFactor and BGM / REF values. I’ll
outline these two measurements below.
RiskFactor vs. BGM/REFIf we look at RiskFactor as a function of seg_val, we see the
following.
RiskPairData %>%
tidyr::gather(key = "seg_key",
value = "seg_val",
c(REF, BGM)) %>%
ggplot(aes(x = seg_val, y = RiskFactor, group = seg_key)) +
geom_point(aes(color = seg_key), alpha = 1/8, size = 0.5)

The values of RiskFactor do not change much for the BGM and REF
values of 400-450, 450-500, and 500-600.
abs_risk VS REF/BGMIf we plot the absolute value of the RiskFactor, we see a similar
pattern.
RiskPairData %>%
tidyr::gather(key = "seg_key",
value = "seg_val",
c(REF, BGM)) %>%
ggplot(aes(x = seg_val, y = abs_risk, group = seg_key)) +
geom_point(aes(color = seg_key), alpha = 1/8, size = 0.5)

The same lines are seen when the absolute value of RiskFactor is
plotted against the BGM and REF values. This explains why the plot
below looks the way it does. The sharp lines are a result of the minimal
change in RiskFactor (or abs_risk) for BGM and REF values of
400-450, and 500-600.
This is the image from the excel file.

I can use this image in the plot as a background and layer the data points from the sample data on top.
Load the image into RStudio and assign it to an object with
png::readPNG().
# read in as png
BackgroundSmooth <- png::readPNG("images/seg600.png")
This is the new plot without any data added to the image. All I do here is create the x and y axes and set a four-point color gradient.
base_layer <- base_data %>%
ggplot(aes(
x = x_coordinate,
y = y_coordinate,
fill = color_gradient)) +
geom_point(size = 0.00000001,
color = "white")
scales_layer <- base_layer +
ggplot2::scale_y_continuous(
limits = c(0, 600),
sec.axis =
sec_axis(~. / mmolConvFactor,
name = "Measured blood glucose (mmol/L)"
),
name = "Measured blood glucose (mg/dL)"
) +
scale_x_continuous(
limits = c(0, 600),
sec.axis =
sec_axis(~. / mmolConvFactor,
name = "Reference blood glucose (mmol/L)"
),
name = "Reference blood glucose (mg/dL)"
)
scales_layer

This is the Gaussian image layer. Now that I have the axes set, I can
set the xmin, xmax, ymin, and ymax values in my plot.
gaussian_layer <- scales_layer +
ggplot2::annotation_custom(
grid::rasterGrob(image = BackgroundSmooth,
width = unit(1,"npc"),
height = unit(1,"npc")),
xmin = 0,
xmax = 600,
ymin = 0,
ymax = 600)
gaussian_layer

In this layer I add the color gradient scaling and coloring, and also the custom labels for each level.
gaussian_gradient_layer <- gaussian_layer +
ggplot2::scale_fill_gradientn( # scale_*_gradientn creats a n-color gradient
values = scales::rescale(c(
0, # darkgreen
0.4375, # green
1.0625, # yellow
2.75, # red
4.0 # brown
)),
limits = c(0, 4),
colors = riskfactor_colors,
guide = guide_colorbar(
ticks = FALSE,
barheight = unit(100, "mm")
),
breaks = c(
0.25,
1,
2,
3,
3.75
),
labels = c(
"none",
"slight",
"moderate",
"high",
"extreme"
),
name = "risk level")
gaussian_gradient_layer

In the final layer, I add the sample data to the plot.
heatmap_plot <- gaussian_gradient_layer +
geom_point(
data = SampMeasData, # introduce sample data frame
aes(
x = REF,
y = BGM
),
shape = 21,
fill = "white",
size = 1.1,
stroke = 0.4,
alpha = 0.8
)
heatmap_plot

Now I just need to add this to the application.