Covid - flexdashboard (documentation)

Packages

These are the packages for data wrangling, import, and visualization.

# dashboard
library(flexdashboard)
library(knitr)
library(DT)
library(janitor)

# theme
library(brotools)
library(hrbrthemes)
library(ggthemes)
library(bbplot)

# data wrangling
library(tidyverse)
library(lubridate)
library(socviz)

# data visualization
library(plotly)
library(gganimate)
library(gifski)
library(skimr)
# custom skim
my_skim <- skimr::skim_with(
  numeric = skimr::sfl(p25 = NULL, p75 = NULL))

# map
library(rmapshaper)
library(sugarbag)

Updates (version 0.1.4)

  • Wrangling steps have been moved into helpers.R script (includes both import and wrangle scripts from the code folder)
  • Replace all use of worldwide with global
  • Color background of global maps set to "whitesmoke"
  • Global maps for New Cases (orthographic) and Recovered (Mercator)
  • Change contents of valueBox()s to sentence case
  • new colors in valueBox()’s
  • "#B22222", "#EE4000", "#EEE9E9", "#00FF7F", "#FFFFF0"
  • Add geofacet maps and use covdata package for comparing NYT data with COVID tracking project
  • Fix geofacet units on y axes (non-scientific notation)
  • add bbplot theme for animated graphs

To-Do/Changes coming

  • investigate git warning for large file storage:
# remote: warning: GH001: Large files detected. You may want to try Git Large File
#         Storage - https://git-lfs.github.com.
# remote: warning: See http://git.io/iEPt8g for more information.
# remote: warning: File data/processed/2020-07-19/2020-07-19-USTSDataAll.csv is 
#                  52.00 MB; this is larger than GitHub's recommended maximum file
#                  size of 50.00 MB
# To https://github.com/mjfrigaard/covid-dashboard.git
#   2973303..a2486af  master -> master
  • Already installed Git Large File Storage (LFS) using,

    brew install git-lfs
    # Update global git config
    git lfs install
    # Update system git config
    sudo -i
    git lfs install --system
  • Add county maps

  • Include Rt for each county

  • Pretty up the reactable on the global data


DATA: Johns Hopkins University’s CSSE

The COVID-19 data objects are loaded below from various sources.

Some of these are updated daily, others are pre-existing datasets for geographic locators or other variables to aid in visualizing/mapping.

Import CSSE time series data from Github

This imports data from the CSSEGISandData. These are in the csse_covid_19_time_series folder. These files are updated daily.

# fs::dir_ls("code/")
source("code/01.0-import-csse-time-series.R")

The following info comes from the README in the time series repo folder:

## Time series summary (csse_covid_19_time_series)

This folder contains daily time series summary tables, including confirmed, 
deaths and recovered. All data is read in from the daily case report. 
The time series tables are subject to be updated if inaccuracies are identified
in our historical data. The daily reports will not be adjusted in these 
instances to maintain a record of raw data.

Two time series tables are for the US confirmed cases and deaths, reported at
the county level. They are named `time_series_covid19_confirmed_US.csv`, 
`time_series_covid19_deaths_US.csv`, respectively.

Three time series tables are for the global confirmed cases, recovered cases 
and deaths. Australia, Canada and China are reported at the province/state 
level. Dependencies of the Netherlands, the UK, France and Denmark are listed 
under the province/state level. The US and other countries are at the country
level. The tables are renamed `time_series_covid19_confirmed_global.csv` and 
`time_series_covid19_deaths_global.csv`, and 
`time_series_covid19_recovered_global.csv`, respectively.

### Update frequency

Once a day around 23:59 (UTC).

### Deprecated warning

The files below were archived here, and will no longer be updated. With the 
release of the new data structure, we are updating our time series tables to
reflect these changes. 

Please reference  `time_series_covid19_confirmed_global.csv` and 
`time_series_covid19_deaths_global.csv` for the latest time series data.

`time_series_19-covid-Confirmed.csv`
`time_series_19-covid-Deaths.csv`
`time_series_19-covid-Recovered.csv`

Export the raw csse covid 19 time series data (global)

All of the raw data files are in this list

names(covid_raw_data_files)
#> [1] "GDPRaw"           "TSConfirmedRaw"   "TSConfirmedUSRaw" "TSDeathsRaw"     
#> [5] "TSDeathsUSRaw"    "TSRecoveredRaw"

I’ve dropped each datasets into a raw folder with a date stamp for safe keeping.

fs::dir_tree(paste0("data/raw/", 
                    base::noquote(lubridate::today())), 
             recurse = FALSE)
#> data/raw/2020-07-19
#> ├── 2020-07-19-GDPRaw.csv
#> ├── 2020-07-19-TSConfirmedRaw.csv
#> ├── 2020-07-19-TSConfirmedUSRaw.csv
#> ├── 2020-07-19-TSDeathsRaw.csv
#> ├── 2020-07-19-TSDeathsUSRaw.csv
#> └── 2020-07-19-TSRecoveredRaw.csv

Wrangling CSSE time series data

The code chunk below runs the script for the wrangling steps to create the data visualizations using the time series CSSE data.

# fs::dir_ls("code/")
source("code/02.0-wrangle-csse-time-series.R")

The following steps were taken to wrangle the time series data:

  1. Convert wide to long (Confirmed, Recovered, Deaths)
  • first I converted TSConfirmedRaw dataset to long form, and converts the Date variable to mdy()
  1. Create WorldTSDataAll by joining Confirmed, Recovered, Deaths
  • This joins the Confirmed, Recovered, and Deaths together into WorldTSDataAll
  1. USTSDataAll = join ConfirmedUS and DeathsUS
  • I want to mimic what I did with the WorldTSDataAll and join these two together. I want country_region to just be named country, and province_state to just be named state. Export these files to processed folder
  1. Create SumRegionDate
  • this groups the WorldTSDataAll data by country_region and date, then summarizes the confirmed_sum, recovered_sum, and deaths_sum variables. Then it creates a “new case” column with dplyr::lag() with confirmed_sum and filters the date to max(date)
  • create a most recent day from SumRegionDate called recent_day
  1. GDP Country Codes: create a smaller version of the GDPRaw dataset. I also rename some of the regions in Gdp2016
Gdp2016 %>% 
  dplyr::filter(str_detect(string = region, pattern = "Gambia, The"))
#> # A tibble: 1 x 4
#>   region      code   year country_region
#>   <chr>       <chr> <dbl> <chr>         
#> 1 Gambia, The GMB    2016 Gambia
  1. CreateSumRegionDateCodes by joining SumRegionDate and Gdp2016
  • Join the SumRegionDate to the Gdp2016 data country. And because this is the first complete dataset I will be using for data visualizations, I will export this into the data/processed folder
fs::dir_ls(paste0("data/processed/", 
                    base::noquote(lubridate::today())),
           regexp = "SumRegionDate.csv")
#> data/processed/2020-07-19/2020-07-19-SumRegionDate.csv

Dashboard layout

This section covers the data visualizations in the dashboard. Because the dashboard is built using multiple pages, with orientation: columns and vertical_layout: fill. Read more about this layout here.

Page 1: Global COVID-19 Data (Maps)

Page 1 is titled, Global COVID-19 Data (Maps). This contains the data from the WorldTSRecent dataset, the

Column 1: data-width=300 .bg-secondary

Column {data-width=250 .bg-secondary}
-----------------------------------------------------------------------

These are built using the following valueBox()s. The dataset WorldTSRecent is below:

rmarkdown::paged_table(
head(WorldTSRecent)
)

This is a tibble with a single row for the most recent stats.

The valueBox() and prettyNum() functions display the objects below:

Total global confirmed cases as of…
WorldTSRecent$date
#> [1] "2020-07-18"
WorldTSRecent$confirmed_sum
#> [1] 14288689
### `r paste0("Total global confirmed cases as of ", WorldTSRecent$date)` 

valueBox(prettyNum(WorldTSRecent$confirmed_sum, big.mark = ","), color = "tomato")
New global cases as of…
WorldTSRecent$date
#> [1] "2020-07-18"
WorldTSRecent$`New Case`
#> [1] 233390
### `r paste0("New global cases as of ", WorldTSRecent$date)`

valueBox(prettyNum(WorldTSRecent$`New Case`, big.mark = ","), color = "sandybrown")
Global deaths as of…
WorldTSRecent$date
#> [1] "2020-07-18"
WorldTSRecent$deaths_sum
#> [1] 602138
### `r paste0("Global deaths as of ", WorldTSRecent$date)` 

valueBox(prettyNum(WorldTSRecent$deaths_sum, big.mark = ","), color = "gray50")
Global recovered cases as of…
WorldTSRecent$date
#> [1] "2020-07-18"
WorldTSRecent$recovered_sum
#> [1] 7904159
### `r paste0("Global recovered cases as of ", WorldTSRecent$date)` 

valueBox(prettyNum(WorldTSRecent$recovered_sum, big.mark = ","), color = "palegreen")
Days since first confirmed case at…
WorldTSRecent$date
#> [1] "2020-07-18"
case_no1
#> [1] "2020-01-22"
days_passed
#> Time difference of 179 days
### `r paste0("Days since first confirmed case at ", case_no1)` 

valueBox(prettyNum(days_passed, big.mark = ","), color = "lightgoldenrodyellow")

Column 2: data-width=700 .tabset

Column {data-width=700 .tabset}
-----------------------------------------------------------------------

This is the plotly::plot_geo() world map. The dataset that it requires is SumRegionDateCodes, and it’s available to view below:

rmarkdown::paged_table(
head(SumRegionDateCodes)
)

Tab 1: Global Confirmed Cases (.tabset)

The visualization uses the plotly::plot_geo() function, which renders a full interactive globe!

# create recent_day
recent_day <- max(SumRegionDateCodes$date)
# Set country boundaries as light gray
line <- list(color = toRGB("#d1d1d1"), width = 0.2)

# create geo for map options
geo <- list(
  bgcolor = "whitesmoke",
  showframe = FALSE,
  showcoastlines = FALSE,
  # this is the globe option
  projection = list(type = "orthographic"),
  resolution = "100",
  showcountries = TRUE,
  showocean = TRUE,
  showlakes = FALSE,
  showrivers = FALSE)

geo_map_confirm_cases <- plotly::plot_geo() %>%
  layout(
    geo = geo,
    paper_bgcolor = "whitesmoke",
    title = paste0("Global COVID-19 confirmed cases as of ", 
                   recent_day)) %>%
  add_trace(
    data = SumRegionDateCodes,
    z = ~Confirmed,
    color = ~Confirmed,
    colors = "Reds",
    text = ~country_region,
    locations = ~code,
    marker = list(line = line))

geo_map_confirm_cases

Tab 2: Global New Cases (.tabset)

# create recent_day
recent_day <- max(SumRegionDateCodes$date)
# Set country boundaries as light gray
line <- list(color = toRGB("#d1d1d1"), width = 0.2)

# create geo for map options
geo <- list(
  bgcolor = "whitesmoke",
  showframe = FALSE,
  showcoastlines = FALSE,
  # this is the globe option
  projection = list(type = "orthographic"),
  resolution = "100",
  showcountries = TRUE,
  showocean = TRUE,
  showlakes = FALSE,
  showrivers = FALSE)

geo_map_new_cases <- plotly::plot_geo() %>%
  layout(
    geo = geo,
    paper_bgcolor = "whitesmoke",
    title = paste0("Global COVID-19 new cases as of ", 
                   recent_day)) %>%
  add_trace(
    data = SumRegionDateCodes,
    z = ~`New Cases`,
    color = ~`New Cases`,
    colors = "Oranges",
    text = ~country_region,
    locations = ~code,
    marker = list(line = line))

geo_map_new_cases

Tab 3: Global Deaths (.tabset)

This is now a Mercator map of the confirmed deaths from COVID.

# Set country boundaries as light gray
line <- list(color = toRGB("#d1d1d1"), width = 0.2)

# create geo for map options
# c("#B0E0E6", "#F0FFF0")
geo <- list(
  oceancolor = "whitesmoke",
  showframe = FALSE,
  showcoastlines = FALSE,
  # this is the mercator option
  projection = list(type = 'Mercator'),
  resolution = "100",
  showcountries = TRUE,
  showocean = TRUE,
  showlakes = FALSE,
  showrivers = FALSE)

geo_map_deaths <- plotly::plot_geo() %>%
  layout(
    geo = geo,
    paper_bgcolor = "whitesmoke",
    title = paste0("Global COVID-19 deaths as of ", 
                   recent_day)) %>%
  add_trace(
    data = SumRegionDateCodes,
    z = ~Deaths,
    color = ~Deaths,
    colors = "Greys",
    text = ~country_region,
    locations = ~code,
    marker = list(line = line))

geo_map_deaths

#### Tab 4: Global Recovered Cases (.tabset)

# Set country boundaries as light gray
line <- list(color = toRGB("#d1d1d1"), width = 0.2)

# create geo for map options
# c("#B0E0E6", "#F0FFF0")
geo <- list(
  oceancolor = "aliceblue",
  showframe = FALSE,
  showcoastlines = FALSE,
  # this is the mercator option
  projection = list(type = 'Mercator'),
  resolution = "100",
  showcountries = TRUE,
  showocean = TRUE,
  showlakes = FALSE,
  showrivers = FALSE)

geo_map_recovered <- plotly::plot_geo() %>%
  layout(
    geo = geo,
    paper_bgcolor = "aliceblue",
    title = paste0("World COVID-19 recovered cases as of ", 
                   recent_day)) %>%
  add_trace(
    data = SumRegionDateCodes,
    z = ~Recovered,
    color = ~Recovered,
    colors = "Greens",
    text = ~country_region,
    locations = ~code,
    marker = list(line = line))

geo_map_recovered

Tab 5: Dataset (Old)

This is the old table with DT.

SumRegionDateCodes %>%
  dplyr::select(`Country region` = country_region,
          `Country code` = code,
          Date = date,
          Confirmed,
          `New Cases`,
          Recovered,
          Deaths) %>%
  dplyr::arrange(desc(Confirmed)) %>%
  DT::datatable(
    rownames = FALSE,
    fillContainer = TRUE,
    options = list(
      bPaginate = FALSE))

Tab 5: Dataset (new)

It has been converted to a reactable.

library(reactable)
data <- SumRegionDateCodes %>%
  dplyr::select(Country = country_region,
                `Country code` = code,
                 Date = date,
                 Confirmed,
                 `New Cases`,
                  Recovered,
                  Deaths) %>%
  dplyr::arrange(desc(Confirmed)) 
reactable::reactable(data,
  defaultSorted = "Confirmed",
  columns = list(
    Confirmed = colDef(
      name = "Confirmed",
      defaultSortOrder = "desc",
      format = colFormat(prefix = "")
    ),
    Country = colDef(
      name = "Country",
      defaultSortOrder = "desc",
      format = colFormat(separators = TRUE)
    ),
    Date = colDef(
      name = "Date",
      defaultSortOrder = "desc",
      format = colFormat(separators = TRUE)
      # format = colFormat(percent = TRUE, digits = 1)
    )
  )
)

To do: Convert this table to sparkline and add histograms.

Page 2: Global COVID-19 Cases (Graphs)

Column 1: data-width=300 .bg-secondary

Total global confirmed cases as of…

WorldTSRecent$date
#> [1] "2020-07-18"
WorldTSRecent$confirmed_sum
#> [1] 14288689
### `r paste0("Total global confirmed cases as of ", WorldTSRecent$date)` 

valueBox(prettyNum(WorldTSRecent$confirmed_sum, big.mark = ","), color = "#B22222")
14,288,689

New global cases as of…

WorldTSRecent$date
#> [1] "2020-07-18"
WorldTSRecent$`New Case`
#> [1] 233390
### `r paste0("New global cases as of ", WorldTSRecent$date)`

valueBox(prettyNum(WorldTSRecent$`New Case`, big.mark = ","), color = "#EE4000")

Global deaths as of…

WorldTSRecent$date
#> [1] "2020-07-18"
WorldTSRecent$deaths_sum
#> [1] 602138
### `r paste0("Global deaths as of ", WorldTSRecent$date)` 

valueBox(prettyNum(WorldTSRecent$deaths_sum, big.mark = ","), color = "#EEE9E9")

Global recovered cases as of

WorldTSRecent$date
#> [1] "2020-07-18"
WorldTSRecent$recovered_sum
#> [1] 7904159
### `r paste0("Global recovered cases as of ", WorldTSRecent$date)` 

valueBox(prettyNum(WorldTSRecent$recovered_sum, big.mark = ","), color = "#00FF7F")

Days since first confirmed case at…

case_no1
#> [1] "2020-01-22"
days_passed
#> Time difference of 179 days
### `r paste0("Days since first confirmed case at ", case_no1)` 

valueBox(prettyNum(days_passed, big.mark = ","), color = "#FFFFF0")

Column 2: data-width=700 .tabset

Data inputs:

  1. Create WorldTSDataAllDate and WorldTSDataRecent
  • WorldTSDataAllDate is WorldTSDataAll grouped by date
  • This groups by the date column, the summarized the confirmed, deaths, and recovered
  1. Create WorldTSDataAllDateLong from WorldTSDataAllDate
  • Now I restructure (pivot) to create WorldTSDataAllDateLong. These data are exported into the processed data folder

Below I create two new graphs with gganimate using global and national data.

  1. Create World data (animated) = WorldTSIncrementLong
  • This requires an incremental dataset that calculates new cases, deaths, and recovered patients with dplyr::lag()
    status variable - dplyr::lag(status variable, 1)
  • Here I filter the WorldTSDataAll to only the US (WorldTSDataUS) and rename province_state and country_region
    country_region == "US"
  1. Create USA data (animated) = USTSDataAllIncrementLong
  • then I create an incremental dataset for US states by grouping by state, calculating the lag (between metric - dplyr::lag(metric)), then summarizing by date
  1. Now I use the incremental dataset to animate the ggplot using gganimate
  • finally I build the animation, first with ggplot2, then pass the plot object to gganimate::transition_reveal(date) and assign the ggplot2::labs()
fs::dir_ls(paste0("data/processed/", 
                    base::noquote(lubridate::today())),
           regexp = "WorldTSData")
#> data/processed/2020-07-19/2020-07-19-WorldTSDataAll.csv
#> data/processed/2020-07-19/2020-07-19-WorldTSDataAllDate.csv
#> data/processed/2020-07-19/2020-07-19-WorldTSDataAllDateLong.csv
#> data/processed/2020-07-19/2020-07-19-WorldTSDataUS.csv
Tab 1: Global COVID-19 Cases (Animated)
colors <- c("#B22222", # confirmed
            "gray65", # deaths
            "green4") # recovered

world_cum_cases <- WorldTSIncrementLong %>%

  ggplot2::ggplot(mapping = aes(x = date,
                  y = increment,
                  group = case,
                  color = case)) +
  
  ggplot2::geom_line(show.legend = FALSE) +
  
  ggplot2::scale_y_continuous(labels = scales::label_number_si(accuracy = 1)) +
  
  ggplot2::scale_color_manual(values = colors) +
  
  ggplot2::geom_segment(aes(xend = max(date) - 1,
                            
                            yend = increment),
                        
                             linetype = "dashed",
                        
                             size = 0.5,
                        
                             colour = "grey75",
                        
                             show.legend = FALSE) +
  # this adds the labels to the graph
  ggplot2::geom_text(aes(x = max(date) + 0.2, 
                         
                        label = case), 
                     
                     nudge_x = -7.5, 
                     
                     show.legend = FALSE, 
                     
                     hjust = 0) +
  # this adds the theme/font
  ggthemes::theme_few(base_size = 10, base_family = "Ubuntu") +
  # set the coordinates
  ggplot2::coord_cartesian(
    xlim = c(min(WorldTSIncrementLong$date),
             max(WorldTSIncrementLong$date) + 7),
    ylim = c(max(0, min(WorldTSIncrementLong$increment)),
             max(WorldTSIncrementLong$increment)),
    clip = "off") +
  
  ggplot2::theme(legend.position = c(0.1, 0.8),
                 # remove x axis title
                  axis.title.x = element_blank()) +
  # remove guides
  ggplot2::guides(size = FALSE) +
  # add the points 
  ggplot2::geom_point(aes(size = increment),
                      
             alpha = 0.7,
             
             show.legend = FALSE) +
  # add scale size 
  ggplot2::scale_size(range = c(2, 10)) +
  # this is the transition (x axis variable)
  gganimate::transition_reveal(date) +
  
  ggplot2::labs(title = "New Global COVID-19 Cases",
                subtitle = "Date: {frame_along}",
                        y = "New daily cases",
                        x = "Date")

animate_world_cum_cases <- gganimate::animate(world_cum_cases, 
                                              nframes = 150,
                                              fps = 10,
                                              rewind = TRUE,
                   renderer = gifski_renderer(loop = TRUE))

Now I save and render.

# and save
gganimate::anim_save(filename =
                       base::paste0(base::noquote(lubridate::today()),
                                       "-animate_world_cum_cases.gif"),
                     animation = last_animation(),
                     path = "figs/")
knitr::include_graphics(path =
                          base::paste0("figs/",
                                       base::noquote(lubridate::today()),
                                                "-animate_world_cum_cases.gif"))

Tab 2: Global COVID-19 Cases (Cumulative)
# set colors
colors <- c("#B22222", # confirmed
            "gray65", # deaths
            "green4") # recovered
# font style
font_style <- list(
  family = "Ubuntu",
  size = 14,
  color = 'black')
# create base chart
world_cum_point_chart <- WorldTSDataAllDateLong %>%
  ggplot2::ggplot(aes(x = date,
                      y = cases,
                      color = status)) +
  geom_point(size = 1, alpha = 2/5) +
  scale_color_manual(values = colors) +
  scale_y_continuous(labels = scales::label_number_si(accuracy = 0.1)) +
    theme(
    plot.margin = margin(0, 0, 0, 0, "pt"),
    panel.background = element_rect(fill = "White"),
    legend.position = "left",
    axis.title = element_blank(),
    axis.ticks = element_blank()) +
  hrbrthemes::theme_ipsum_tw(plot_title_family = "Ubuntu") +
  labs(title = "Global COVID-19 Cumulative Cases",
       y = "Cases",
       x = "Date",
       color = " ")
# pass over to plotly
ggplotly(world_cum_point_chart) %>%
  plotly::layout(legend = list(orientation = "h"),
                 font = font_style)

Page 3: US COVID-19 Data (Maps)

This page is the US time-series COVID-19 data.

Column 1: data-width=300 .bg-secondary

Total US confirmed cases as of…

SumUSRecentCountry$date_max
#> [1] "2020-07-18"
SumUSRecentCountry$confirmed_sum
#> [1] 3711413
### `r paste0("Total US confirmed cases as of ", SumUSRecentCountry$date_max)` 

valueBox(prettyNum(SumUSRecentCountry$confirmed_sum, big.mark = ","), color = "#B22222")

New US cases as of …

SumUSRecentCountry$date_max
#> [1] "2020-07-18"
SumUSRecentCountry$new_case_sum
#> [1] 63698
### `r paste0("New US cases as of ", SumUSRecentCountry$date_max)` 

valueBox(prettyNum(SumUSRecentCountry$new_case_sum, big.mark = ","), color = "#EE4000")

US recovered cases as of…

SumUSRecentCountry$date_max
#> [1] "2020-07-18"
SumUSRecentCountry$recovered_sum
#> [1] 1122720
### `r paste0("US recovered cases as of ", SumUSRecentCountry$date_max)` 


valueBox(prettyNum(SumUSRecentCountry$recovered_sum, big.mark = ","), color = "#EEE9E9")

US deaths as of…

SumUSRecentCountry$date_max
#> [1] "2020-07-18"
SumUSRecentCountry$deaths_sum
#> [1] 140119
### `r paste0("US deaths as of ", SumUSRecentCountry$date_max)` 

valueBox(prettyNum(SumUSRecentCountry$deaths_sum, big.mark = ","), color = "#00FF7F")

Days since first confirmed case on…

us_first_case_day
#> [1] "2020-01-22"
us_days_passed
#> Time difference of 179 days
### `r paste0("Days since first confirmed case on ", us_first_case_day)` 

valueBox(prettyNum(us_days_passed, big.mark = ","), color = "#FFFFF0")

Column 2: data-width=700 .tabset

Tab 1: United States Confirmed Cases

This is a map of the COVID-19 data for the US using the county-level data from the USTSDataAll dataset, but I need to reduce this to the 51 states in the continental US.

The data for these are stored in the confirmed_us vector.

head(confirmed_us, 10)
#>     Alabama      Alaska     Arizona    Arkansas  California    Colorado 
#>       65234        1796      141265       32533      380745       39770 
#> Connecticut    Delaware     Florida     Georgia 
#>       47893       13429      337569      139880

These are paired with the state.name and state.abb vectors from the maps package. Read more about how to create these in the plotly-r book.

us_map_layout <- list(
  scope = 'usa',
  lakecolor = "#3399FF",
  projection = list(type = 'albers usa'))

plot_geo() %>%
  add_trace(
    z = confirmed_us, 
    text = state.name, 
    span = I(0),
    locations = state.abb, 
    locationmode = 'USA-states') %>%
  layout(geo = us_map_layout,
         title = "Current US Confirmed Cases")

The same map is created below using the total confirmed deaths.

head(deaths_us, 10)
#>     Alabama      Alaska     Arizona    Arkansas  California    Colorado 
#>        1286          18        2730         357        7702        1752 
#> Connecticut    Delaware     Florida     Georgia 
#>        4396         523        4895        3169

Tab 2: United States Deaths

us_map_layout <- list(
  scope = 'usa',
  lakecolor = "#3399FF",
  projection = list(type = 'albers usa'))

plot_geo() %>%
  add_trace(
    z = deaths_us, 
    text = state.name, 
    span = I(0),
    locations = state.abb, 
    locationmode = 'USA-states') %>%
  layout(geo = us_map_layout,
         title = "Current US Deaths")

Tab 2: United States New Cases

Below is a map of new US cases.

head(new_case_us, 10)
#>     Alabama      Alaska     Arizona    Arkansas  California    Colorado 
#>        2143          62        2742         771        7486         444 
#> Connecticut    Delaware     Florida     Georgia 
#>           0          92       10328        4688
us_map_layout <- list(
  scope = 'usa',
  lakecolor = "#3399FF",
  projection = list(type = 'albers usa'))

plot_geo() %>%
  add_trace(
    z = new_case_us, 
    text = state.name, 
    span = I(0),
    locations = state.abb, 
    locationmode = 'USA-states') %>%
  layout(geo = us_map_layout,
         title = "New US Cases")

Tab 4: US Map Data

library(reactable)
data <- SumUSDataMap %>% 
  dplyr::select(State = state,
                Date = date,
                Confirmed = confirmed_sum,
                `New Cases` = new_case_sum,
                Deaths = deaths_sum) %>%
  dplyr::arrange(desc(Confirmed)) 
reactable::reactable(data,
  defaultSorted = "Confirmed",
  columns = list(
    State = colDef(
      name = "State",
      format = colFormat(prefix = "")
    ),
    Date = colDef(
      name = "Date",
      format = colFormat(separators = TRUE)
    ),
    Confirmed = colDef(
      name = "Confirmed",
      defaultSortOrder = "desc",
      format = colFormat(separators = TRUE)
    ),
    `New Cases` = colDef(
      name = "New Cases",
      format = colFormat(separators = TRUE)
    ),
      Deaths = colDef(
      name = "Deaths",
      format = colFormat(separators = TRUE)
    )
  )
)

Page 4: US COVID-19 Data (Graphs)

Column 1: data-width=300 .bg-secondary

Total US confirmed cases as of…

SumUSRecentCountry$date_max
#> [1] "2020-07-18"
SumUSRecentCountry$confirmed_sum
#> [1] 3711413
### `r paste0("Total US confirmed cases as of ", SumUSRecentCountry$date_max)` 

valueBox(prettyNum(SumUSRecentCountry$confirmed_sum, big.mark = ","), color = "#B22222")

New US cases as of…

SumUSRecentCountry$date_max
#> [1] "2020-07-18"
SumUSRecentCountry$new_case_sum
#> [1] 63698
### `r paste0("New US cases as of ", SumUSRecentCountry$date_max)` 

valueBox(prettyNum(SumUSRecentCountry$new_case_sum, big.mark = ","), color = "#EE4000")

US recovered cases as of…

SumUSRecentCountry$date_max
#> [1] "2020-07-18"
SumUSRecentCountry$recovered_sum
#> [1] 1122720
### `r paste0("US recovered cases as of ", SumUSRecentCountry$date_max)` 

valueBox(prettyNum(SumUSRecentCountry$recovered_sum, big.mark = ","), color = "#EEE9E9")

US deaths as of…

SumUSRecentCountry$date_max
#> [1] "2020-07-18"
SumUSRecentCountry$deaths_sum
#> [1] 140119
### `r paste0("US deaths as of ", SumUSRecentCountry$date_max)` 

valueBox(prettyNum(SumUSRecentCountry$deaths_sum, big.mark = ","), color = "#00FF7F")

Days since first confirmed case on…

us_first_case_day
#> [1] "2020-01-22"
us_days_passed
#> Time difference of 179 days
### `r paste0("Days since first confirmed case on ", us_first_case_day)` 

valueBox(prettyNum(us_days_passed, big.mark = ","), color = "#FFFFF0")

Column 2 data-width=700 .tabset

Tab 1: US New Cases (Animated) (.tabset)

Now I can create the animated plot with ggplot2, specify the animation (gganimate::transition_reveal(date)), and pass it to the animation.

colors <- c("#B22222", # confirmed
            "gray65", # deaths
            "green4") # recovered

us_cum_cases <- USTSDataAllIncrementLong %>%
  # format for the K y axis
  dplyr::mutate(increment = increment/1000) %>% 
  
  ggplot2::ggplot(aes(x = date,
             y = increment,
             group = case,
             color = case)) +
    # add line
  ggplot2::geom_line(show.legend = FALSE) +
  
  ggplot2::scale_color_manual(values = colors) +
  
  # add segment, no legend
    
  ggplot2::geom_segment(aes(xend = max(date) - 1,
                            
                            yend = increment),
                        
                        linetype = "dashed",
                        
                        size = 0.5,
                        
                        color = "grey75",
                        
                        show.legend = FALSE) +
    # add text, no legend
    
  ggplot2::geom_text(aes(x = max(date) + 0.2,
                         
                         label = case),
                     
                     hjust = 0,
                     
                     nudge_x = -8.0, 
                     
                     show.legend = FALSE) +
    # set theme
    ggthemes::theme_few(base_size = 10, base_family = "Ubuntu") +
    # set cartesian coordinates to min/max dates
    ggplot2::coord_cartesian(xlim = c(min(USTSDataAllIncrementLong$date),
                                    
                                    max(USTSDataAllIncrementLong$date) + 7),
                             
                           clip = "off") +
    # position the legend
  ggplot2::theme(legend.position = c(0.1, 0.8),
                 # no x axis title
                 axis.title.x = element_blank()) +
    # no guides
  ggplot2::guides(size = FALSE) +
  
  ggplot2::geom_point(aes(size = increment),
                      
                      alpha = 0.7,
                      
                      show.legend = FALSE) +

  ggplot2::scale_size(range = c(2, 10)) +
    # set transition
  gganimate::transition_reveal(date) +
    # assign labs
  ggplot2::labs(title = "US COVID-19 Cases",
                subtitle = "at date: {frame_along}",
                y = "New Cases",
                color = "Status",
                x = "Date") + 
  
   ggplot2::scale_y_continuous(label = scales::unit_format(unit = "K")) 

animated_us_cum_cases <- gganimate::animate(us_cum_cases,
                                            nframes = 150,
                                            fps = 10,
                                            rewind = TRUE,
        renderer = gifski_renderer(loop = TRUE))
# and save
gganimate::anim_save(filename =
                       base::paste0(base::noquote(lubridate::today()),
                                       "-animated_us_cum_cases.gif"),
                     animation = last_animation(),
                     path = "figs/")
knitr::include_graphics(path = 
                          base::paste0("figs/", 
                                       base::noquote(lubridate::today()),
                                                "-animated_us_cum_cases.gif"))

Column 3: data-width=700 .tabset

These are the geofacet plots, built with data from the covdata package.

The covdata package

I’ll be using data from the covdata package by Kieran Healy. The goal with this package is to build an set of graphs that the user can select an input (selectInput()) from a list of metrics, and see that metric reflected across all 50 states.

The datasets I’ll be using are covus and nytcovstate. The script below imports and wrangles these data.

# fs::dir_ls("code")
source("code/01.1-import-wrangle-geofacet.R")

The covus data is a tidy dataset, with a date for each day, and each metric in the measure variable.

rmarkdown::paged_table(
Covus %>% head())

A cleaner version of the measure variable is stored in the measure_label variable.

rmarkdown::paged_table(
dplyr::distinct(Covus, measure_label))
rmarkdown::paged_table(
dplyr::distinct(Covus, measure))

Now I add the necessary date variables grouped by state, remove the regions not included in the geofacet, and put these in a dataset called MapCovus.

rmarkdown::paged_table(
MapCovus %>% head())

I then created a table for positive tests called PosMapCovus.

rmarkdown::paged_table(
PosMapCovus %>% head())

This was indexed on Positive Test Metric and Positive Test Value.

rmarkdown::paged_table(
TidyPosMapCovus %>% head())

The NYTCovState data has cases and deaths.

rmarkdown::paged_table(
NYTCovState %>% head())

The NYTCovState was joined to TidyPosMapCovus to create the TidyCovDeathData data, which I use to compare deaths between NYT and COVID-tracking project.

rmarkdown::paged_table(
TidyCovDeathData %>% head())

I’ve covered how to create the geofacet graphs in this storybench post.

Tab 2: Positive Tests

These are the positive tests from the COVID tracking project data.

geofacet_pos <- PosMapCovus %>% 
    # adjust for scales on y axis
    dplyr::mutate(`positive tests` = `positive tests`/1000) %>% 
    # plot this with new adjusted positive tests
    ggplot2::ggplot(aes(x = days_elapsed, 
                        y = `positive tests`,
                        group = date)) +
  
    geom_col(alpha = 2/10, 
             linetype = 0) + 
  
    ggplot2::geom_line(data = TidyPosMapCovus, 
                       
    mapping = aes(x = days_elapsed, 
                  y = `Positive Test Value`/1000,
                  group = `Positive Test Metric`,
                  color = `Positive Test Metric`), 
                    show.legend = TRUE) +
      
    geofacet::facet_geo( ~ state, 
                         
                       grid = "us_state_grid2",
                       
                       scales = "free_y")  +
  
    ggplot2::labs(title = "US positive COVID tests (7-day rolling average)", 
                  
                  subtitle = paste0("Between ", 
                                    min(PosMapCovus$date), 
                                    " and ", 
                                    max(PosMapCovus$date)),
                  
                  caption = "SOURCE: https://covidtracking.com/",
                  
                  y = "New Positive Tests",
                  x = "Days Elapsed") + 
  
    scale_y_continuous(label = scales::unit_format(unit = "K")) +
  
    ggthemes::theme_tufte(base_size = 10, base_family = "Ubuntu") +
  
    ggplot2::theme(axis.text.x = element_text(angle = 315),
                 legend.position = "top") 

  

geofacet_pos

ggsave(plot = geofacet_pos, 
       filename = "figs/geofacet_pos.png", 
       device = "png", 
       dpi = "retina",
       width = 16, 
       height = 10, 
       units = "in", 
       limitsize = FALSE)

Tab 3: US Cases

I want to compare the NYT and COVID-19 tracking datasets (positive tests vs. NYT cases). This can be accomplished using the TidyCovCaseData which has both cases from the NYT dataset, and the positive measure from the COVID tracking project dataset.

geofacet_cases <- TidyCovCaseData %>% 
        # adjust for y scale formatting (`K`)
        dplyr::mutate(`Cases Value` = `Cases Value`/1000) %>% 
        
        ggplot2::ggplot(aes(x = days_elapsed, 
                            
                           y = `Cases Value`, 
                  
                           group = `Cases Key`,
                  
                           color = `Cases Key`)) +
  
    ggplot2::geom_line(show.legend = TRUE) +
          
    geofacet::facet_geo( ~ state, 
                       grid = "us_state_grid2",
                       scales = "free_y")  +
  
    ggplot2::labs(title = "US COVID Cases", 
                  
                  subtitle = paste0("Between ", 
                                    min(TidyCovCaseData$date), 
                                    " and ", 
                                    max(TidyCovCaseData$date)),
                  
                  y = "New Cases",
                  x = "Days Elapsed",
                  
caption = "https://covidtracking.com | https://github.com/nytimes/covid-19-data") + 
          
    ggplot2::scale_y_continuous(label = scales::unit_format(unit = "K")) +
  
    ggplot2::theme_minimal() +
  
    ggplot2::theme(axis.text.x = element_text(angle = 315),
                 legend.position = "top") 
  


geofacet_cases

ggsave(plot = geofacet_cases, 
       filename = "figs/geofacet_cases.png", 
       device = "png", 
       dpi = "retina",
       width = 16, 
       height = 10, 
       units = "in", 
       limitsize = FALSE)

Tab 4: US Deaths

Finally, I can repeat this process, but compare deaths between the NYT and COVID-tracking datasets (which later I can turn into a selectInput).

geofacet_deaths <- TidyCovDeathData %>%

        ggplot2::ggplot(aes(x = days_elapsed, 
                          
                           y = `Death Value`, 
                          
                           group = `Death Key`,
                          
                           color = `Death Key`)) +
  
        ggplot2::geom_line(show.legend = TRUE) +
  
        geofacet::facet_geo( ~ state, 
                             
                             grid = "us_state_grid2", 
                             scales = "free_y")  +
  
ggplot2::labs(title = "US COVID deaths", 
              subtitle = paste0("Between ", 
                                min(DeathsMapCovus$date), 
                                " and ", 
                                max(DeathsMapCovus$date)),
              
              y = "New Positive Tests",
              x = "Days Elapsed",
caption = "https://covidtracking.com | https://github.com/nytimes/covid-19-data") + 
      
       ggthemes::theme_tufte(base_size = 10, base_family = "Ubuntu") +
      
       ggplot2::theme(axis.text.x = element_text(angle = 315),
                     legend.position = "top") 
  
 
geofacet_deaths

ggsave(plot = geofacet_deaths, 
       filename = "figs/geofacet_deaths.png", 
       device = "png", 
       dpi = "retina",
       width = 16, 
       height = 10, 
       units = "in", 
       limitsize = FALSE)

Page 5: Sources

Data Sources:

  1. The time series data comes from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) COVID-19 dashboard data. You can find the time series files here. These files are updated daily with new cases, recovered, and deaths.

  2. The country codes come from the GDP country codes in 2016. Will be updated as needed.

  3. The covdata package from Kieran Healy contains data from the NYT database, COVID tracking project, and others.