Covid - flexdashboard (documentation)
Packages
These are the packages for data wrangling, import, and visualization.
# dashboard
library(flexdashboard)
library(knitr)
library(DT)
library(janitor)
# theme
library(brotools)
library(hrbrthemes)
library(ggthemes)
library(bbplot)
# data wrangling
library(tidyverse)
library(lubridate)
library(socviz)
# data visualization
library(plotly)
library(gganimate)
library(gifski)
library(skimr)
# custom skim
skimr::skim_with(
my_skim <-numeric = skimr::sfl(p25 = NULL, p75 = NULL))
# map
library(rmapshaper)
library(sugarbag)
Updates (version 0.1.4)
-
Wrangling steps have been moved into
helpers.R
script (includes bothimport
andwrangle
scripts from thecode
folder)
-
Replace all use of
worldwide
withglobal
-
Color background of global maps set to
"whitesmoke"
-
Global maps for
New Cases
(orthographic
) andRecovered
(Mercator
)
-
Change contents of
valueBox()
s to sentence case
-
new colors in
valueBox()
’s "#B22222"
,"#EE4000"
,"#EEE9E9"
,"#00FF7F"
,"#FFFFF0"
-
Add
geofacet
maps and usecovdata
package for comparing NYT data with COVID tracking project
-
Fix
geofacet
units ony
axes (non-scientific notation)
-
add
bbplot
theme for animated graphs
To-Do/Changes coming
- investigate git warning for large file storage:
# remote: warning: GH001: Large files detected. You may want to try Git Large File
# Storage - https://git-lfs.github.com.
# remote: warning: See http://git.io/iEPt8g for more information.
# remote: warning: File data/processed/2020-07-19/2020-07-19-USTSDataAll.csv is
# 52.00 MB; this is larger than GitHub's recommended maximum file
# size of 50.00 MB
# To https://github.com/mjfrigaard/covid-dashboard.git
# 2973303..a2486af master -> master
Already installed Git Large File Storage (LFS) using,
brew install git-lfs # Update global git config git lfs install # Update system git config sudo -i git lfs install --system
Add county maps
Include Rt for each county
Pretty up the
reactable
on the global data
DATA: Johns Hopkins University’s CSSE
The COVID-19 data objects are loaded below from various sources.
Some of these are updated daily, others are pre-existing datasets for geographic locators or other variables to aid in visualizing/mapping.
Import CSSE time series data from Github
This imports data from the CSSEGISandData
. These are in the csse_covid_19_time_series
folder. These files are updated daily.
# fs::dir_ls("code/")
source("code/01.0-import-csse-time-series.R")
The following info comes from the README in the time series repo folder:
## Time series summary (csse_covid_19_time_series)
This folder contains daily time series summary tables, including confirmed,
deaths and recovered. All data is read in from the daily case report.
The time series tables are subject to be updated if inaccuracies are identified
in our historical data. The daily reports will not be adjusted in these
instances to maintain a record of raw data.
Two time series tables are for the US confirmed cases and deaths, reported at`time_series_covid19_confirmed_US.csv`,
the county level. They are named `time_series_covid19_deaths_US.csv`, respectively.
Three time series tables are for the global confirmed cases, recovered cases
and deaths. Australia, Canada and China are reported at the province/state
level. Dependencies of the Netherlands, the UK, France and Denmark are listed
under the province/state level. The US and other countries are at the country`time_series_covid19_confirmed_global.csv` and
level. The tables are renamed `time_series_covid19_deaths_global.csv`, and
`time_series_covid19_recovered_global.csv`, respectively.
### Update frequency
Once a day around 23:59 (UTC).
### Deprecated warning
The files below were archived here, and will no longer be updated. With the
release of the new data structure, we are updating our time series tables to
reflect these changes.
`time_series_covid19_confirmed_global.csv` and
Please reference `time_series_covid19_deaths_global.csv` for the latest time series data.
`time_series_19-covid-Confirmed.csv`
`time_series_19-covid-Deaths.csv`
`time_series_19-covid-Recovered.csv`
Export the raw csse covid 19 time series data (global)
All of the raw data files are in this list
names(covid_raw_data_files)
#> [1] "GDPRaw" "TSConfirmedRaw" "TSConfirmedUSRaw" "TSDeathsRaw"
#> [5] "TSDeathsUSRaw" "TSRecoveredRaw"
I’ve dropped each datasets into a raw
folder with a date stamp for safe keeping.
::dir_tree(paste0("data/raw/",
fs::noquote(lubridate::today())),
baserecurse = FALSE)
#> data/raw/2020-07-19
#> ├── 2020-07-19-GDPRaw.csv
#> ├── 2020-07-19-TSConfirmedRaw.csv
#> ├── 2020-07-19-TSConfirmedUSRaw.csv
#> ├── 2020-07-19-TSDeathsRaw.csv
#> ├── 2020-07-19-TSDeathsUSRaw.csv
#> └── 2020-07-19-TSRecoveredRaw.csv
Wrangling CSSE time series data
The code chunk below runs the script for the wrangling steps to create the data visualizations using the time series CSSE data.
# fs::dir_ls("code/")
source("code/02.0-wrangle-csse-time-series.R")
The following steps were taken to wrangle the time series data:
- Convert wide to long (
Confirmed
,Recovered
,Deaths
)
- first I converted
TSConfirmedRaw
dataset to long form, and converts theDate
variable tomdy()
- Create
WorldTSDataAll
by joiningConfirmed
,Recovered
,Deaths
- This joins the
Confirmed
,Recovered
, andDeaths
together intoWorldTSDataAll
USTSDataAll
= joinConfirmedUS
andDeathsUS
- I want to mimic what I did with the
WorldTSDataAll
and join these two together. I wantcountry_region
to just be namedcountry
, andprovince_state
to just be namedstate
. Export these files to processed folder
- Create
SumRegionDate
- this groups the
WorldTSDataAll
data bycountry_region
anddate
, then summarizes theconfirmed_sum
,recovered_sum
, anddeaths_sum
variables. Then it creates a “new case” column withdplyr::lag()
withconfirmed_sum
andfilters
thedate
tomax(date)
- create a most recent day from
SumRegionDate
calledrecent_day
- GDP Country Codes: create a smaller version of the
GDPRaw
dataset. I also rename some of theregions
inGdp2016
%>%
Gdp2016 dplyr::filter(str_detect(string = region, pattern = "Gambia, The"))
#> # A tibble: 1 x 4
#> region code year country_region
#> <chr> <chr> <dbl> <chr>
#> 1 Gambia, The GMB 2016 Gambia
- Create
SumRegionDateCodes
by joiningSumRegionDate
andGdp2016
- Join the
SumRegionDate
to theGdp2016
data country. And because this is the first complete dataset I will be using for data visualizations, I will export this into thedata/processed
folder
::dir_ls(paste0("data/processed/",
fs::noquote(lubridate::today())),
baseregexp = "SumRegionDate.csv")
#> data/processed/2020-07-19/2020-07-19-SumRegionDate.csv
Dashboard layout
This section covers the data visualizations in the dashboard. Because the dashboard is built using multiple pages, with orientation: columns
and vertical_layout: fill
. Read more about this layout here.
Page 1: Global COVID-19 Data (Maps)
Page 1 is titled, Global COVID-19 Data (Maps)
. This contains the data from the WorldTSRecent
dataset, the
Column 1: data-width=300 .bg-secondary
Column {data-width=250 .bg-secondary}
-----------------------------------------------------------------------
These are built using the following valueBox()
s. The dataset WorldTSRecent
is below:
::paged_table(
rmarkdownhead(WorldTSRecent)
)
This is a tibble with a single row for the most recent stats.
The valueBox()
and prettyNum()
functions display the objects below:
Total global confirmed cases as of…
$date WorldTSRecent
#> [1] "2020-07-18"
$confirmed_sum WorldTSRecent
#> [1] 14288689
### `r paste0("Total global confirmed cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$confirmed_sum, big.mark = ","), color = "tomato")
New global cases as of…
$date WorldTSRecent
#> [1] "2020-07-18"
$`New Case` WorldTSRecent
#> [1] 233390
### `r paste0("New global cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$`New Case`, big.mark = ","), color = "sandybrown")
Global deaths as of…
$date WorldTSRecent
#> [1] "2020-07-18"
$deaths_sum WorldTSRecent
#> [1] 602138
### `r paste0("Global deaths as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$deaths_sum, big.mark = ","), color = "gray50")
Global recovered cases as of…
$date WorldTSRecent
#> [1] "2020-07-18"
$recovered_sum WorldTSRecent
#> [1] 7904159
### `r paste0("Global recovered cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$recovered_sum, big.mark = ","), color = "palegreen")
Days since first confirmed case at…
$date WorldTSRecent
#> [1] "2020-07-18"
case_no1
#> [1] "2020-01-22"
days_passed
#> Time difference of 179 days
### `r paste0("Days since first confirmed case at ", case_no1)`
valueBox(prettyNum(days_passed, big.mark = ","), color = "lightgoldenrodyellow")
Column 2: data-width=700 .tabset
Column {data-width=700 .tabset}
-----------------------------------------------------------------------
This is the plotly::plot_geo()
world map. The dataset that it requires is SumRegionDateCodes
, and it’s available to view below:
::paged_table(
rmarkdownhead(SumRegionDateCodes)
)
Tab 1: Global Confirmed Cases (.tabset
)
The visualization uses the plotly::plot_geo()
function, which renders a full interactive globe!
# create recent_day
max(SumRegionDateCodes$date)
recent_day <-# Set country boundaries as light gray
list(color = toRGB("#d1d1d1"), width = 0.2)
line <-
# create geo for map options
list(
geo <-bgcolor = "whitesmoke",
showframe = FALSE,
showcoastlines = FALSE,
# this is the globe option
projection = list(type = "orthographic"),
resolution = "100",
showcountries = TRUE,
showocean = TRUE,
showlakes = FALSE,
showrivers = FALSE)
plotly::plot_geo() %>%
geo_map_confirm_cases <- layout(
geo = geo,
paper_bgcolor = "whitesmoke",
title = paste0("Global COVID-19 confirmed cases as of ",
%>%
recent_day)) add_trace(
data = SumRegionDateCodes,
z = ~Confirmed,
color = ~Confirmed,
colors = "Reds",
text = ~country_region,
locations = ~code,
marker = list(line = line))
geo_map_confirm_cases
Tab 2: Global New Cases (.tabset
)
# create recent_day
max(SumRegionDateCodes$date)
recent_day <-# Set country boundaries as light gray
list(color = toRGB("#d1d1d1"), width = 0.2)
line <-
# create geo for map options
list(
geo <-bgcolor = "whitesmoke",
showframe = FALSE,
showcoastlines = FALSE,
# this is the globe option
projection = list(type = "orthographic"),
resolution = "100",
showcountries = TRUE,
showocean = TRUE,
showlakes = FALSE,
showrivers = FALSE)
plotly::plot_geo() %>%
geo_map_new_cases <- layout(
geo = geo,
paper_bgcolor = "whitesmoke",
title = paste0("Global COVID-19 new cases as of ",
%>%
recent_day)) add_trace(
data = SumRegionDateCodes,
z = ~`New Cases`,
color = ~`New Cases`,
colors = "Oranges",
text = ~country_region,
locations = ~code,
marker = list(line = line))
geo_map_new_cases
Tab 3: Global Deaths (.tabset
)
This is now a Mercator
map of the confirmed deaths from COVID.
# Set country boundaries as light gray
list(color = toRGB("#d1d1d1"), width = 0.2)
line <-
# create geo for map options
# c("#B0E0E6", "#F0FFF0")
list(
geo <-oceancolor = "whitesmoke",
showframe = FALSE,
showcoastlines = FALSE,
# this is the mercator option
projection = list(type = 'Mercator'),
resolution = "100",
showcountries = TRUE,
showocean = TRUE,
showlakes = FALSE,
showrivers = FALSE)
plotly::plot_geo() %>%
geo_map_deaths <- layout(
geo = geo,
paper_bgcolor = "whitesmoke",
title = paste0("Global COVID-19 deaths as of ",
%>%
recent_day)) add_trace(
data = SumRegionDateCodes,
z = ~Deaths,
color = ~Deaths,
colors = "Greys",
text = ~country_region,
locations = ~code,
marker = list(line = line))
geo_map_deaths
.tabset
)
# Set country boundaries as light gray
list(color = toRGB("#d1d1d1"), width = 0.2)
line <-
# create geo for map options
# c("#B0E0E6", "#F0FFF0")
list(
geo <-oceancolor = "aliceblue",
showframe = FALSE,
showcoastlines = FALSE,
# this is the mercator option
projection = list(type = 'Mercator'),
resolution = "100",
showcountries = TRUE,
showocean = TRUE,
showlakes = FALSE,
showrivers = FALSE)
plotly::plot_geo() %>%
geo_map_recovered <- layout(
geo = geo,
paper_bgcolor = "aliceblue",
title = paste0("World COVID-19 recovered cases as of ",
%>%
recent_day)) add_trace(
data = SumRegionDateCodes,
z = ~Recovered,
color = ~Recovered,
colors = "Greens",
text = ~country_region,
locations = ~code,
marker = list(line = line))
geo_map_recovered
Tab 5: Dataset (Old)
This is the old table with DT
.
%>%
SumRegionDateCodes dplyr::select(`Country region` = country_region,
`Country code` = code,
Date = date,
Confirmed,`New Cases`,
Recovered,%>%
Deaths) dplyr::arrange(desc(Confirmed)) %>%
DT::datatable(
rownames = FALSE,
fillContainer = TRUE,
options = list(
bPaginate = FALSE))
Tab 5: Dataset (new)
It has been converted to a reactable
.
library(reactable)
SumRegionDateCodes %>%
data <- dplyr::select(Country = country_region,
`Country code` = code,
Date = date,
Confirmed,`New Cases`,
Recovered,%>%
Deaths) dplyr::arrange(desc(Confirmed))
::reactable(data,
reactabledefaultSorted = "Confirmed",
columns = list(
Confirmed = colDef(
name = "Confirmed",
defaultSortOrder = "desc",
format = colFormat(prefix = "")
),Country = colDef(
name = "Country",
defaultSortOrder = "desc",
format = colFormat(separators = TRUE)
),Date = colDef(
name = "Date",
defaultSortOrder = "desc",
format = colFormat(separators = TRUE)
# format = colFormat(percent = TRUE, digits = 1)
)
) )
To do: Convert this table to sparkline
and add histograms.
Page 2: Global COVID-19 Cases (Graphs)
Column 1: data-width=300 .bg-secondary
Total global confirmed cases as of…
$date WorldTSRecent
#> [1] "2020-07-18"
$confirmed_sum WorldTSRecent
#> [1] 14288689
### `r paste0("Total global confirmed cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$confirmed_sum, big.mark = ","), color = "#B22222")
New global cases as of…
$date WorldTSRecent
#> [1] "2020-07-18"
$`New Case` WorldTSRecent
#> [1] 233390
### `r paste0("New global cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$`New Case`, big.mark = ","), color = "#EE4000")
Global deaths as of…
$date WorldTSRecent
#> [1] "2020-07-18"
$deaths_sum WorldTSRecent
#> [1] 602138
### `r paste0("Global deaths as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$deaths_sum, big.mark = ","), color = "#EEE9E9")
Global recovered cases as of
$date WorldTSRecent
#> [1] "2020-07-18"
$recovered_sum WorldTSRecent
#> [1] 7904159
### `r paste0("Global recovered cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$recovered_sum, big.mark = ","), color = "#00FF7F")
Days since first confirmed case at…
case_no1
#> [1] "2020-01-22"
days_passed
#> Time difference of 179 days
### `r paste0("Days since first confirmed case at ", case_no1)`
valueBox(prettyNum(days_passed, big.mark = ","), color = "#FFFFF0")
Column 2: data-width=700 .tabset
Data inputs:
- Create
WorldTSDataAllDate
andWorldTSDataRecent
WorldTSDataAllDate
isWorldTSDataAll
grouped bydate
- This groups by the
date
column, the summarized theconfirmed
,deaths
, andrecovered
- Create
WorldTSDataAllDateLong
fromWorldTSDataAllDate
- Now I restructure (pivot) to create
WorldTSDataAllDateLong
. These data are exported into the processed data folder
Below I create two new graphs with gganimate
using global and national data.
- Create World data (animated) =
WorldTSIncrementLong
- This requires an
incremental
dataset that calculates newcases
,deaths
, andrecovered
patients withdplyr::lag()
status variable - dplyr::lag(status variable, 1)
- Here I filter the
WorldTSDataAll
to only the US (WorldTSDataUS
) and renameprovince_state
andcountry_region
country_region
=="US"
- Create USA data (animated) =
USTSDataAllIncrementLong
- then I create an incremental dataset for US states by grouping by
state
, calculating thelag
(betweenmetric - dplyr::lag(metric)
), then summarizing bydate
- Now I use the incremental dataset to animate the
ggplot
usinggganimate
- finally I build the animation, first with
ggplot2
, then pass the plot object togganimate::transition_reveal(date)
and assign theggplot2::labs()
::dir_ls(paste0("data/processed/",
fs::noquote(lubridate::today())),
baseregexp = "WorldTSData")
#> data/processed/2020-07-19/2020-07-19-WorldTSDataAll.csv
#> data/processed/2020-07-19/2020-07-19-WorldTSDataAllDate.csv
#> data/processed/2020-07-19/2020-07-19-WorldTSDataAllDateLong.csv
#> data/processed/2020-07-19/2020-07-19-WorldTSDataUS.csv
Tab 1: Global COVID-19 Cases (Animated)
c("#B22222", # confirmed
colors <-"gray65", # deaths
"green4") # recovered
WorldTSIncrementLong %>%
world_cum_cases <-
ggplot2::ggplot(mapping = aes(x = date,
y = increment,
group = case,
color = case)) +
ggplot2::geom_line(show.legend = FALSE) +
ggplot2::scale_y_continuous(labels = scales::label_number_si(accuracy = 1)) +
ggplot2::scale_color_manual(values = colors) +
ggplot2::geom_segment(aes(xend = max(date) - 1,
yend = increment),
linetype = "dashed",
size = 0.5,
colour = "grey75",
show.legend = FALSE) +
# this adds the labels to the graph
ggplot2::geom_text(aes(x = max(date) + 0.2,
label = case),
nudge_x = -7.5,
show.legend = FALSE,
hjust = 0) +
# this adds the theme/font
ggthemes::theme_few(base_size = 10, base_family = "Ubuntu") +
# set the coordinates
ggplot2::coord_cartesian(
xlim = c(min(WorldTSIncrementLong$date),
max(WorldTSIncrementLong$date) + 7),
ylim = c(max(0, min(WorldTSIncrementLong$increment)),
max(WorldTSIncrementLong$increment)),
clip = "off") +
ggplot2::theme(legend.position = c(0.1, 0.8),
# remove x axis title
axis.title.x = element_blank()) +
# remove guides
ggplot2::guides(size = FALSE) +
# add the points
ggplot2::geom_point(aes(size = increment),
alpha = 0.7,
show.legend = FALSE) +
# add scale size
ggplot2::scale_size(range = c(2, 10)) +
# this is the transition (x axis variable)
gganimate::transition_reveal(date) +
ggplot2::labs(title = "New Global COVID-19 Cases",
subtitle = "Date: {frame_along}",
y = "New daily cases",
x = "Date")
gganimate::animate(world_cum_cases,
animate_world_cum_cases <-nframes = 150,
fps = 10,
rewind = TRUE,
renderer = gifski_renderer(loop = TRUE))
Now I save and render.
# and save
::anim_save(filename =
gganimate::paste0(base::noquote(lubridate::today()),
base"-animate_world_cum_cases.gif"),
animation = last_animation(),
path = "figs/")
::include_graphics(path =
knitr::paste0("figs/",
base::noquote(lubridate::today()),
base"-animate_world_cum_cases.gif"))
Tab 2: Global COVID-19 Cases (Cumulative)
# set colors
c("#B22222", # confirmed
colors <-"gray65", # deaths
"green4") # recovered
# font style
list(
font_style <-family = "Ubuntu",
size = 14,
color = 'black')
# create base chart
WorldTSDataAllDateLong %>%
world_cum_point_chart <- ggplot2::ggplot(aes(x = date,
y = cases,
color = status)) +
geom_point(size = 1, alpha = 2/5) +
scale_color_manual(values = colors) +
scale_y_continuous(labels = scales::label_number_si(accuracy = 0.1)) +
theme(
plot.margin = margin(0, 0, 0, 0, "pt"),
panel.background = element_rect(fill = "White"),
legend.position = "left",
axis.title = element_blank(),
axis.ticks = element_blank()) +
hrbrthemes::theme_ipsum_tw(plot_title_family = "Ubuntu") +
labs(title = "Global COVID-19 Cumulative Cases",
y = "Cases",
x = "Date",
color = " ")
# pass over to plotly
ggplotly(world_cum_point_chart) %>%
plotly::layout(legend = list(orientation = "h"),
font = font_style)
Page 3: US COVID-19 Data (Maps)
This page is the US time-series COVID-19 data.
Column 1: data-width=300 .bg-secondary
Total US confirmed cases as of…
$date_max SumUSRecentCountry
#> [1] "2020-07-18"
$confirmed_sum SumUSRecentCountry
#> [1] 3711413
### `r paste0("Total US confirmed cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$confirmed_sum, big.mark = ","), color = "#B22222")
New US cases as of …
$date_max SumUSRecentCountry
#> [1] "2020-07-18"
$new_case_sum SumUSRecentCountry
#> [1] 63698
### `r paste0("New US cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$new_case_sum, big.mark = ","), color = "#EE4000")
US recovered cases as of…
$date_max SumUSRecentCountry
#> [1] "2020-07-18"
$recovered_sum SumUSRecentCountry
#> [1] 1122720
### `r paste0("US recovered cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$recovered_sum, big.mark = ","), color = "#EEE9E9")
US deaths as of…
$date_max SumUSRecentCountry
#> [1] "2020-07-18"
$deaths_sum SumUSRecentCountry
#> [1] 140119
### `r paste0("US deaths as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$deaths_sum, big.mark = ","), color = "#00FF7F")
Days since first confirmed case on…
us_first_case_day
#> [1] "2020-01-22"
us_days_passed
#> Time difference of 179 days
### `r paste0("Days since first confirmed case on ", us_first_case_day)`
valueBox(prettyNum(us_days_passed, big.mark = ","), color = "#FFFFF0")
Column 2: data-width=700 .tabset
Tab 1: United States Confirmed Cases
This is a map of the COVID-19 data for the US using the county-level data from the USTSDataAll
dataset, but I need to reduce this to the 51 states in the continental US.
The data for these are stored in the confirmed_us
vector.
head(confirmed_us, 10)
#> Alabama Alaska Arizona Arkansas California Colorado
#> 65234 1796 141265 32533 380745 39770
#> Connecticut Delaware Florida Georgia
#> 47893 13429 337569 139880
These are paired with the state.name
and state.abb
vectors from the maps package. Read more about how to create these in the plotly-r book.
list(
us_map_layout <-scope = 'usa',
lakecolor = "#3399FF",
projection = list(type = 'albers usa'))
plot_geo() %>%
add_trace(
z = confirmed_us,
text = state.name,
span = I(0),
locations = state.abb,
locationmode = 'USA-states') %>%
layout(geo = us_map_layout,
title = "Current US Confirmed Cases")
The same map is created below using the total confirmed deaths.
head(deaths_us, 10)
#> Alabama Alaska Arizona Arkansas California Colorado
#> 1286 18 2730 357 7702 1752
#> Connecticut Delaware Florida Georgia
#> 4396 523 4895 3169
Tab 2: United States Deaths
list(
us_map_layout <-scope = 'usa',
lakecolor = "#3399FF",
projection = list(type = 'albers usa'))
plot_geo() %>%
add_trace(
z = deaths_us,
text = state.name,
span = I(0),
locations = state.abb,
locationmode = 'USA-states') %>%
layout(geo = us_map_layout,
title = "Current US Deaths")
Tab 2: United States New Cases
Below is a map of new US cases.
head(new_case_us, 10)
#> Alabama Alaska Arizona Arkansas California Colorado
#> 2143 62 2742 771 7486 444
#> Connecticut Delaware Florida Georgia
#> 0 92 10328 4688
list(
us_map_layout <-scope = 'usa',
lakecolor = "#3399FF",
projection = list(type = 'albers usa'))
plot_geo() %>%
add_trace(
z = new_case_us,
text = state.name,
span = I(0),
locations = state.abb,
locationmode = 'USA-states') %>%
layout(geo = us_map_layout,
title = "New US Cases")
Tab 4: US Map Data
library(reactable)
SumUSDataMap %>%
data <- dplyr::select(State = state,
Date = date,
Confirmed = confirmed_sum,
`New Cases` = new_case_sum,
Deaths = deaths_sum) %>%
dplyr::arrange(desc(Confirmed))
::reactable(data,
reactabledefaultSorted = "Confirmed",
columns = list(
State = colDef(
name = "State",
format = colFormat(prefix = "")
),Date = colDef(
name = "Date",
format = colFormat(separators = TRUE)
),Confirmed = colDef(
name = "Confirmed",
defaultSortOrder = "desc",
format = colFormat(separators = TRUE)
),`New Cases` = colDef(
name = "New Cases",
format = colFormat(separators = TRUE)
),Deaths = colDef(
name = "Deaths",
format = colFormat(separators = TRUE)
)
) )
Page 4: US COVID-19 Data (Graphs)
Column 1: data-width=300 .bg-secondary
Total US confirmed cases as of…
$date_max SumUSRecentCountry
#> [1] "2020-07-18"
$confirmed_sum SumUSRecentCountry
#> [1] 3711413
### `r paste0("Total US confirmed cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$confirmed_sum, big.mark = ","), color = "#B22222")
New US cases as of…
$date_max SumUSRecentCountry
#> [1] "2020-07-18"
$new_case_sum SumUSRecentCountry
#> [1] 63698
### `r paste0("New US cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$new_case_sum, big.mark = ","), color = "#EE4000")
US recovered cases as of…
$date_max SumUSRecentCountry
#> [1] "2020-07-18"
$recovered_sum SumUSRecentCountry
#> [1] 1122720
### `r paste0("US recovered cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$recovered_sum, big.mark = ","), color = "#EEE9E9")
US deaths as of…
$date_max SumUSRecentCountry
#> [1] "2020-07-18"
$deaths_sum SumUSRecentCountry
#> [1] 140119
### `r paste0("US deaths as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$deaths_sum, big.mark = ","), color = "#00FF7F")
Days since first confirmed case on…
us_first_case_day
#> [1] "2020-01-22"
us_days_passed
#> Time difference of 179 days
### `r paste0("Days since first confirmed case on ", us_first_case_day)`
valueBox(prettyNum(us_days_passed, big.mark = ","), color = "#FFFFF0")
Column 2 data-width=700 .tabset
Tab 1: US New Cases (Animated) (.tabset
)
Now I can create the animated plot with ggplot2
, specify the animation (gganimate::transition_reveal(date)
), and pass it to the animation.
c("#B22222", # confirmed
colors <-"gray65", # deaths
"green4") # recovered
USTSDataAllIncrementLong %>%
us_cum_cases <- # format for the K y axis
dplyr::mutate(increment = increment/1000) %>%
ggplot2::ggplot(aes(x = date,
y = increment,
group = case,
color = case)) +
# add line
ggplot2::geom_line(show.legend = FALSE) +
ggplot2::scale_color_manual(values = colors) +
# add segment, no legend
ggplot2::geom_segment(aes(xend = max(date) - 1,
yend = increment),
linetype = "dashed",
size = 0.5,
color = "grey75",
show.legend = FALSE) +
# add text, no legend
ggplot2::geom_text(aes(x = max(date) + 0.2,
label = case),
hjust = 0,
nudge_x = -8.0,
show.legend = FALSE) +
# set theme
ggthemes::theme_few(base_size = 10, base_family = "Ubuntu") +
# set cartesian coordinates to min/max dates
ggplot2::coord_cartesian(xlim = c(min(USTSDataAllIncrementLong$date),
max(USTSDataAllIncrementLong$date) + 7),
clip = "off") +
# position the legend
ggplot2::theme(legend.position = c(0.1, 0.8),
# no x axis title
axis.title.x = element_blank()) +
# no guides
ggplot2::guides(size = FALSE) +
ggplot2::geom_point(aes(size = increment),
alpha = 0.7,
show.legend = FALSE) +
ggplot2::scale_size(range = c(2, 10)) +
# set transition
gganimate::transition_reveal(date) +
# assign labs
ggplot2::labs(title = "US COVID-19 Cases",
subtitle = "at date: {frame_along}",
y = "New Cases",
color = "Status",
x = "Date") +
ggplot2::scale_y_continuous(label = scales::unit_format(unit = "K"))
gganimate::animate(us_cum_cases,
animated_us_cum_cases <-nframes = 150,
fps = 10,
rewind = TRUE,
renderer = gifski_renderer(loop = TRUE))
# and save
::anim_save(filename =
gganimate::paste0(base::noquote(lubridate::today()),
base"-animated_us_cum_cases.gif"),
animation = last_animation(),
path = "figs/")
::include_graphics(path =
knitr::paste0("figs/",
base::noquote(lubridate::today()),
base"-animated_us_cum_cases.gif"))
Column 3: data-width=700 .tabset
These are the geofacet
plots, built with data from the covdata
package.
The covdata
package
I’ll be using data from the covdata package by Kieran Healy. The goal with this package is to build an set of graphs that the user can select an input (selectInput()
) from a list of metrics, and see that metric reflected across all 50 states.
The datasets I’ll be using are covus
and nytcovstate
. The script below imports and wrangles these data.
# fs::dir_ls("code")
source("code/01.1-import-wrangle-geofacet.R")
The covus
data is a tidy dataset, with a date for each day, and each metric in the measure
variable.
::paged_table(
rmarkdown%>% head()) Covus
A cleaner version of the measure
variable is stored in the measure_label
variable.
::paged_table(
rmarkdown::distinct(Covus, measure_label)) dplyr
::paged_table(
rmarkdown::distinct(Covus, measure)) dplyr
Now I add the necessary date variables grouped by state
, remove the regions not included in the geofacet
, and put these in a dataset called MapCovus
.
::paged_table(
rmarkdown%>% head()) MapCovus
I then created a table for positive tests called PosMapCovus
.
::paged_table(
rmarkdown%>% head()) PosMapCovus
This was indexed on Positive Test Metric
and Positive Test Value
.
::paged_table(
rmarkdown%>% head()) TidyPosMapCovus
The NYTCovState
data has cases
and deaths
.
::paged_table(
rmarkdown%>% head()) NYTCovState
The NYTCovState
was joined to TidyPosMapCovus
to create the TidyCovDeathData
data, which I use to compare deaths between NYT and COVID-tracking project.
::paged_table(
rmarkdown%>% head()) TidyCovDeathData
I’ve covered how to create the geofacet
graphs in this storybench post.
Tab 2: Positive Tests
These are the positive tests from the COVID tracking project
data.
PosMapCovus %>%
geofacet_pos <- # adjust for scales on y axis
dplyr::mutate(`positive tests` = `positive tests`/1000) %>%
# plot this with new adjusted positive tests
ggplot2::ggplot(aes(x = days_elapsed,
y = `positive tests`,
group = date)) +
geom_col(alpha = 2/10,
linetype = 0) +
ggplot2::geom_line(data = TidyPosMapCovus,
mapping = aes(x = days_elapsed,
y = `Positive Test Value`/1000,
group = `Positive Test Metric`,
color = `Positive Test Metric`),
show.legend = TRUE) +
geofacet::facet_geo( ~ state,
grid = "us_state_grid2",
scales = "free_y") +
ggplot2::labs(title = "US positive COVID tests (7-day rolling average)",
subtitle = paste0("Between ",
min(PosMapCovus$date),
" and ",
max(PosMapCovus$date)),
caption = "SOURCE: https://covidtracking.com/",
y = "New Positive Tests",
x = "Days Elapsed") +
scale_y_continuous(label = scales::unit_format(unit = "K")) +
ggthemes::theme_tufte(base_size = 10, base_family = "Ubuntu") +
ggplot2::theme(axis.text.x = element_text(angle = 315),
legend.position = "top")
geofacet_pos
ggsave(plot = geofacet_pos,
filename = "figs/geofacet_pos.png",
device = "png",
dpi = "retina",
width = 16,
height = 10,
units = "in",
limitsize = FALSE)
Tab 3: US Cases
I want to compare the NYT and COVID-19 tracking datasets (positive tests vs. NYT cases). This can be accomplished using the TidyCovCaseData
which has both cases from the NYT dataset, and the positive
measure from the COVID tracking project dataset.
TidyCovCaseData %>%
geofacet_cases <- # adjust for y scale formatting (`K`)
dplyr::mutate(`Cases Value` = `Cases Value`/1000) %>%
ggplot2::ggplot(aes(x = days_elapsed,
y = `Cases Value`,
group = `Cases Key`,
color = `Cases Key`)) +
ggplot2::geom_line(show.legend = TRUE) +
geofacet::facet_geo( ~ state,
grid = "us_state_grid2",
scales = "free_y") +
ggplot2::labs(title = "US COVID Cases",
subtitle = paste0("Between ",
min(TidyCovCaseData$date),
" and ",
max(TidyCovCaseData$date)),
y = "New Cases",
x = "Days Elapsed",
caption = "https://covidtracking.com | https://github.com/nytimes/covid-19-data") +
ggplot2::scale_y_continuous(label = scales::unit_format(unit = "K")) +
ggplot2::theme_minimal() +
ggplot2::theme(axis.text.x = element_text(angle = 315),
legend.position = "top")
geofacet_cases
ggsave(plot = geofacet_cases,
filename = "figs/geofacet_cases.png",
device = "png",
dpi = "retina",
width = 16,
height = 10,
units = "in",
limitsize = FALSE)
Tab 4: US Deaths
Finally, I can repeat this process, but compare deaths between the NYT and COVID-tracking datasets (which later I can turn into a selectInput
).
TidyCovDeathData %>%
geofacet_deaths <-
ggplot2::ggplot(aes(x = days_elapsed,
y = `Death Value`,
group = `Death Key`,
color = `Death Key`)) +
ggplot2::geom_line(show.legend = TRUE) +
geofacet::facet_geo( ~ state,
grid = "us_state_grid2",
scales = "free_y") +
::labs(title = "US COVID deaths",
ggplot2subtitle = paste0("Between ",
min(DeathsMapCovus$date),
" and ",
max(DeathsMapCovus$date)),
y = "New Positive Tests",
x = "Days Elapsed",
caption = "https://covidtracking.com | https://github.com/nytimes/covid-19-data") +
ggthemes::theme_tufte(base_size = 10, base_family = "Ubuntu") +
ggplot2::theme(axis.text.x = element_text(angle = 315),
legend.position = "top")
geofacet_deaths
ggsave(plot = geofacet_deaths,
filename = "figs/geofacet_deaths.png",
device = "png",
dpi = "retina",
width = 16,
height = 10,
units = "in",
limitsize = FALSE)
Page 5: Sources
Data Sources:
The time series data comes from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) COVID-19 dashboard data. You can find the time series files here. These files are updated daily with new cases, recovered, and deaths.
The country codes come from the GDP country codes in 2016. Will be updated as needed.
The covdata package from Kieran Healy contains data from the NYT database, COVID tracking project, and others.