Data Reshaping in R

class: center, middle, inverse, title-slide

# Data Reshaping in R
## bmRn CSM: Reshaping and joining data in R
### Martin Frigaard
### 2020-12-13

---

class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 7%
background-size: 9%

# Objectives

## 1) Recap `dplyr`'s data manipulation verbs

## 2) `tidyr`'s pivoting functions

## 3) Joins with `dplyr`

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 7%
background-size: 9%

# Materials

### Follow along with the exercises:
https://mjfrigaard.github.io/data-trans-joins-exercises/Index.html

### A web version of these slides is located: 
https://mjfrigaard.github.io/data-transformations-joins/Index.html#1

### The RStudio.Cloud project:
https://rstudio.cloud/project/1941654

---
class: left, top
background-image: url(img/dplyr.png)
background-position: 95% 7%
background-size: 9%

# Data Manipulation Recap

## We previously learned how to:

### 1) View data with `glimpse()`

### 2) Select columns with `select()`

### 3) Filter rows with `filter()`

### 4) Arrange data with `arrange()`

### 5) Create/change columns with `mutate()`

---
class: inverse, center, bottom
background-image: url(img/dplyr.png)
background-position: 50% 10%
background-size: 35%

# dplyr = a package for *manipulating* data

---
class: inverse, center, bottom
background-image: url(img/tidyr.png)
background-position: 50% 10%
background-size: 35%

# tidyr = a package for *reshaping* data

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 90% 10%
background-size: 12%

# Tidy data

<br>

## What are tidy data?

.pull-left[

### Observations are in rows  
### Variables are in columns   
### Values are in cells

]

.pull-right[

]

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 90% 10%
background-size: 12%

# Tidy data

<br>

### Values are the *intersection* of observations and variables

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 90% 10%
background-size: 8%

# Non-tidy data

### Copy and paste the code below to create `NotTidy`

```r
# copy and paste me!
NotTidy <- tibble::tribble(
       ~group,   ~`2019`,   ~`2020`,   ~`2021`,
          "A", "102/100", "123/100", "161/100",
          "B", "179/100", "199/100", "221/100",
          "C", "223/100", "146/100", "288/100")
```

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":["group"],"name":[1],"type":["chr"],"align":["left"]},{"label":["2019"],"name":[2],"type":["chr"],"align":["left"]},{"label":["2020"],"name":[3],"type":["chr"],"align":["left"]},{"label":["2021"],"name":[4],"type":["chr"],"align":["left"]}],"data":[{"1":"A","2":"102/100","3":"123/100","4":"161/100"},{"1":"B","2":"179/100","3":"199/100","4":"221/100"},{"1":"C","2":"223/100","3":"146/100","4":"288/100"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 90% 10%
background-size: 12%

# Non-tidy data

## Why aren't they tidy?

```r
NotTidy %>% View("NotTidy")
```

> `year` is across columns...

> multiple values in cells...

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 8%
background-size: 8%

## Tidying un-tidy data

***

--
*covered in slides*

#### `pivot_longer()` - wide to long

#### `pivot_wider()` - long to wide

***

--
*covered in exercises*

.pull-left[

`separate()` - pull columns apart  
 `separate_rows()` - split columns down rows   
 `unite()` - stick columns together  
]

.pull-right[
`unnest()` - flatten columns  
`uncount()` - duplicate rows according to a weighting variable  
]

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 90% 10%
background-size: 9%

# Quick tip: viewing your data

### Make sure you view the data before assigning it to an object

<br>

### Use `glimpse()` or `View("Name")`

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 8%
background-size: 9%

## `tidyr::pivot_longer()`

.pull-left[

Make wide data long

```r
NotTidy %>% 
  pivot_longer(
    cols = -group, 
    names_to = "year", 
    values_to = "rate") %>% 
  View("Tidy")
```

]

.pull-right[

How it `pivot_longer()` works

]

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 8%
background-size: 9%

## `tidyr::pivot_longer()`

#### How it works:

`cols` = these are the **columns we want to reshape**

```r
NotTidy %>% # we include by omission
* pivot_longer(cols = -group,
```

`names_to` = this is the new variable that will contain the previous **column names**

```r
NotTidy %>% 
  pivot_longer(cols = -group, # new column
*              names_to = "year",
```

`values_to` = this is the new variable that will contain the reshaped **values**

```r
NotTidy %>% 
  pivot_longer(cols = -group, 
               names_to = "year", # column for values
*              values_to = "rate")
```

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 8%
background-size: 9%

## `tidyr::pivot_longer()`

.pull-left[

### Looks correct?

```r
NotTidy %>% 
  pivot_longer(cols = -group, 
         names_to = "year", 
        values_to = "rate") %>% 
  View("Tidy")
```

]

.pull-right[

### Not quite

]

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 95%
background-size: 9%

## `pivot_longer()` = names_transform

### Print `Tidy` to console

.pull-left[

```r
Tidy
```

```
## # A tibble: 9 x 3
##   group year  rate   
*##   <chr> <chr> <chr>  
## 1 A     2019  102/100
## 2 A     2020  123/100
## 3 A     2021  161/100
## 4 B     2019  179/100
## 5 B     2020  199/100
## 6 B     2021  221/100
## 7 C     2019  223/100
## 8 C     2020  146/100
## 9 C     2021  288/100
```

]

.pull-right[

### *Note the format of the columns*

]

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 95%
background-size: 9%

## `pivot_longer()` = names_transform

### The new `year` variable should be numeric

#### We can control this behavior with `names_transform = list()`

.pull-left[

```r
NotTidy %>% 
  pivot_longer(
       cols = -group, 
       names_to = "year", 
       values_to = "rate", 
*      names_transform = list(
*        year = as.numeric))
```

]

.pull-right[

```
## # A tibble: 9 x 3
##   group  year rate   
*##   <chr> <dbl> <chr>  
## 1 A      2019 102/100
## 2 A      2020 123/100
## 3 A      2021 161/100
## 4 B      2019 179/100
## 5 B      2020 199/100
## 6 B      2021 221/100
## 7 C      2019 223/100
## 8 C      2020 146/100
## 9 C      2021 288/100
```

]

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 8%
background-size: 9%

## `pivot_longer()` = names_sep

Create the `SiteRates` data

```r
SiteRates <- tibble::tribble(
             ~site, ~`2019_Q1`, ~`2019_Q2`, ~`2019_Q3`, ~`2019_Q4`,
          "Boston",         52,         31,         26,       33.4,
    "Philadelphia",       7.42,       5.51,       5.82,       6.99,
     "Cincinnati ",       6.73,       4.87,       5.02,       4.66,
           "Texas",       18.2,       16.6,         17,         19)

SiteRates %>% View("SiteRates")
```

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## `pivot_longer()` = names_sep

### `SiteRates` has *two* variables in the same column
 
<img src="img/SiteRatesView.png" width="85%" height="85%" style="display: block; margin: auto;" />

#### We can use `names_sep ` uses a pattern to split the column (`_Q`)

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## `pivot_longer()` = names_sep

#### Add `names_sep = "_Q"`

#### `year` and `quarter` should also be numeric

.pull-left[

```r
SiteRates %>% 
  pivot_longer(
  -site,
  names_to = 
    c("year", "quarter"), 
  values_to = "rate",
* names_sep = "_Q",
  names_transform = list(
*   year = as.integer,
*   quarter = as.integer)) %>%
  View("TidySites")
```

]

.pull-right[

]

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## More `pivot_longer()` options

Create the `SpaceDogs` data. These are names of dogs in the [Soviet Space Dogs database](https://airtable.com/universe/expG3z2CFykG1dZsp/sovet-space-dogs).

*Some dogs have multiple names, and some share names.*

```r
# data.frame prints to the screen better
SpaceDogs <- data.frame(
   date = c("1966-02-22", "1961-03-25", 
              "1961-03-09", "1960-12-22",
              "1960-12-01", "1960-09-22"),
   result = c("recovered safely", "recovered safely",
              "recovered safely", "recovered",
              "both dogs died","recovered safely"),
   name_1 = c("Ugolyok / Snezhok", "Zvezdochka", 
              "Chernuskha", "Shutka", "Mushka",
              "Kusachka / Otvazhnaya"),
   name_2 = c("Veterok / Bzdunok", NA, NA, 
              "Kometka", "Pchyolka", "Neva"))
```

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## More `pivot_longer()` options

```r
SpaceDogs %>% 
  View("SpaceDogs")
```

The `SpaceDogs` data has missing values in `name_2`

*...two of the columns have a similar prefix (`name_`)*

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## More `pivot_longer()` options

### To tidy these data, we can combine what we've learned:

#### 1. `dplyr::starts_with()` to select the `name_` columns

#### 2. `names_to` has a special `".value"` argument, which is the name of the new column with the `name_` values. We need to include an index column to track the values across different names (`dog_id`).

#### 3. We can remove missing values with  `values_drop_na`

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## More `pivot_longer()` options

```r
SpaceDogs %>% 
  pivot_longer( # columns with `name_` prefix
*  starts_with("name_"),
```

```r
SpaceDogs %>% 
  pivot_longer(
   starts_with("name_"), 
   names_sep = "_",
   names_to = # new column for `name_` values
*    c(".value", "dog_id"),
```

```r
SpaceDogs %>% 
  pivot_longer(
   starts_with("name_"), 
   names_sep = "_",
   names_to = 
     c(".value", "dog_id"), # remove missing
*  values_drop_na = TRUE)
```

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## More `pivot_longer()` options

### Add these arguments to `pivot_longer()` and add `View()`

.pull-left[

```r
SpaceDogs %>% 
  
  pivot_longer(
*  starts_with("name_"),
   names_sep = "_",
   names_to = 
     c(".value", "dog_id"), 
*  values_drop_na = TRUE) %>%
  
  View("TidySpaceDogs")
```

]

.pull-right[

]

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## More `pivot_longer()` options

<br>

### We can see the new `dog_id` index

### The missing values have been removed

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

# `tidyr::pivot_wider()`

<br>

## We've made wide data long, now we will make long data wide

### *But, why???*

### Wide data is usually better for displaying summaries

### Long data is better for graphing/modeling

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

# `tidyr::pivot_wider()`

<br>

### Consider the `CatVsDogWide` data

<br>

.pull-left[

#### These data come from this article in the [Washington Post](https://www.washingtonpost.com/news/wonk/wp/2014/07/28/where-cats-are-more-popular-than-dogs-in-the-u-s-and-all-over-the-world/?utm_term=.b50cb49b78b5).

]

.pull-right[

]

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## `tidyr::pivot_wider()`

**Calculate percent pets per household, dog owners, and cat owners**

```r
CatVsDogWide <- tibble::tribble(
                   ~metric,   ~CA,  ~TX,  ~FL,  ~NY,  ~PA,
        "no_of_households", 12974, 9002, 7609, 7512, 5172,
    "no_of_pet_households",  6865, 5265, 4138, 3802, 2942,
    "no_of_dog_households",  4260, 3960, 2718, 2177, 1702,
    "no_of_cat_households",  3687, 2544, 2079, 2189, 1748)

CatVsDogWide %>% View("CvDWide")
```

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## `tidyr::pivot_wider()`

**Calculate percent pets per household, dog owners, and cat owners**

Step 1. `pivot_long()` states into `state` column, values into `value` column

.pull-left[

```r
CatVsDogWide %>%
  pivot_longer(-metric, 
    names_to = "state", 
    values_to = "value") %>% 
  # view
  View("CvDLong")
```

]

.pull-right[

]

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## `tidyr::pivot_wider()`

**Calculate percent pets per household, dog owners, and cat owners**

Step 1. `pivot_long()` states into `state` column, values into `value` column  
--

Step 2. `pivot_wider()` the `metric` across columns

```r
CatVsDogWide %>%
  pivot_longer(-metric, names_to = "state", values_to = "value") %>% 
  pivot_wider(names_from = "metric", values_from = "value") %>% 
  View("CvDWide")
```

*Now these data are in a format to calculate percentages!*

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## `tidyr::pivot_wider()`

**Calculate percent pets per household, dog owners, and cat owners**

Step 1. `pivot_long()` states into `state` column, values into `value` column    
Step 2. `pivot_wider()` the `metric` and `values` across columns  
--

Step 3. `mutate()` the `perc` (percentage) columns

```r
CatVsDogWide %>% 
   pivot_longer(-metric, names_to = "state", values_to = "value") %>% 
   pivot_wider(names_from = "metric", values_from = "value") %>% 
  
  mutate(
   perc_pet_household = no_of_pet_households / no_of_households * 100,
   perc_pet_household = round(perc_pet_household, digits = 1),
   perc_dog_owners = no_of_dog_households / no_of_households * 100, 
   perc_dog_owners = round(perc_dog_owners, digits = 1),
   perc_cat_owners = no_of_cat_households / no_of_households * 100,
   perc_cat_owners = round(perc_cat_owners, digits = 1)) %>%
  
  View("CvDWide")
```

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## `tidyr::pivot_wider()`

**Calculate percent pets per household, dog owners, and cat owners**

Step 1. `pivot_long()` states into `state` column, values into `value` column    
Step 2. `pivot_wider()` the `metric` and `values` across columns

Step 3. `mutate()` the `perc` (percentage) columns

***We could assign here, but why stop?***

Add `select()` helpers to reorganize data!

```r
  select(state, 
         contains("perc_dog"), contains("perc_cat"))
```

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## `tidyr::pivot_wider()`

***Final code:***  
Here is the complete pipeline

```r
CatVsDogWide %>% 
  # pivots
   pivot_longer(
     -metric, names_to = "state", values_to = "value") %>% 
   pivot_wider(
     names_from = "metric", values_from = "value") %>% 
  # mutate
  mutate(
   perc_pet_household = no_of_pet_households / no_of_households * 100,
   perc_pet_household = round(perc_pet_household, digits = 1),
   perc_dog_owners = no_of_dog_households / no_of_households * 100, 
   perc_dog_owners = round(perc_dog_owners, digits = 1),
   perc_cat_owners = no_of_cat_households / no_of_households * 100,
   perc_cat_owners = round(perc_cat_owners, digits = 1)) %>% 
  # select
  select(state, 
         contains("perc_dog"), contains("perc_cat"))
```

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 5%
background-size: 9%

## `tidyr::pivot_wider()`

***Final output:***  
Here is the final summary table

---
class: left, top
background-image: url(img/tidyr.png)
background-position: 95% 90%
background-size: 8%

# More tidying in the exercises!

### `separate()` - pull columns apart

### `separate_rows()` - split columns down rows

### `unite()` - stick columns together

### `unnest()` - flatten columns

### `uncount()` - duplicate rows according to a weighting variable

---
class: inverse, center, bottom
background-image: url(img/dplyr.png)
background-position: 50% 20%
background-size: 35%

# dplyr = a package for *manipulating relational* data

---
background-image: url(img/dplyr.png)
background-position: 90% 10%
background-size: 8%
class: left, top

# The `dplyr` joining functions

### ***`dplyr` has functions for joining multiple  tibbles or data.frames***

### `left_join()`

### `right_join()`

### `inner_join()`

### `full_join()`

*Recall that tibbles and data.frame's are nearly identical

---
background-image: url(img/dplyr.png)
background-position: 90% 10%
background-size: 12%
class: left, top

# dplyr joins

.pull-left[

### Toy data `X`

```r
# create X table
X <- tibble::tribble(
    ~A, ~B, ~C, 
    "a", "t", 1L,
    "b", "u", 2L,
    "c", "v", 3L)
```

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["C"],"name":[3],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"1"},{"1":"b","2":"u","3":"2"},{"1":"c","2":"v","3":"3"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

]

.pull-right[

### Toy data `Y`

```r
# create Y table
Y <- tibble::tribble(
    ~A, ~B, ~D, 
    "a", "t", 3L,
    "b", "u", 2L,
    "d", "W", 1L)
```

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["D"],"name":[3],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"3"},{"1":"b","2":"u","3":"2"},{"1":"d","2":"W","3":"1"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

]

---
background-image: url(img/dplyr.png)
background-position: 90% 10%
background-size: 12%
class: left, top

# `dplyr` left joins

### `left_join(x = , y = )`

#### ...joins on matches *from* right-hand table (`Y`) *to* left-hand table (`X`)

.pull-left[

```r
left_join(
* x = X,
* y = Y
  )
```

Keep all data from `X`, and only matching data from `Y`

]

.pull-right[

This creates:

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["C"],"name":[3],"type":["int"],"align":["right"]},{"label":["D"],"name":[4],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"1","4":"3"},{"1":"b","2":"u","3":"2","4":"2"},{"1":"c","2":"v","3":"3","4":"NA"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

]

---
background-image: url(img/left-join-01.png)
background-position: 50% 80%
background-size: 45%
class: left, top

# `dplyr` left joins (1)

### Left joins use all the data from `X` (the left-hand table)

---
background-image: url(img/left-join-02.png)
background-position: 50% 70%
background-size: 85%
class: left, top

# `dplyr` left joins (2)

### Left joins include matched data in the right-hand table `Y`, and it carries over any new corresponding columns

---
background-image: url(img/left-join-03.png)
background-position: 50% 75%
background-size: 90%
class: left, top

# `dplyr` left join (3)

### The final data includes the new column(s) from `Y` (the right-hand table), and missing values for the unmatched rows.

---
background-image: url(img/dplyr.png)
background-position: 90% 10%
background-size: 12%
class: left, top

# `dplyr` right joins

### `right_join(x = , y = )`

#### ...join on matches *from* right-hand table (`Y`) *to* left-hand table (`X`)

.pull-left[

```r
right_join(x = X, y = Y)
```

Keep all data from `Y`, and only matching data from `X`

]

.pull-right[

This creates:

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["C"],"name":[3],"type":["int"],"align":["right"]},{"label":["D"],"name":[4],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"1","4":"3"},{"1":"b","2":"u","3":"2","4":"2"},{"1":"d","2":"W","3":"NA","4":"1"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

]

---
background-image: url(img/right-join-01.png)
background-position: 50% 75%
background-size: 60%
class: left, top

# `dplyr` right joins (1)

### Right joins use all the data from `Y` (the right-hand table)

---
background-image: url(img/right-join-02.png)
background-position: 50% 75%
background-size: 85%
class: left, top

# `dplyr` right joins (2)

### Right joins include the matched data in the left-hand table `X`, and they carry over any new corresponding columns

---
background-image: url(img/right-join-03.png)
background-position: 50% 75%
background-size: 90%
class: left, top

# `dplyr` right joins (3)

### The final data includes the new column(s) from `X` (the left-hand table), and missing values for the unmatched rows.

---
background-image: url(img/dplyr.png)
background-position: 90% 10%
background-size: 12%
class: left, top

# `dplyr` inner joins

### `inner_join(x = , y = )`

#### ...keep only matches in *both* `x` *and* `y`

.pull-left[

```r
inner_join(x = X, y = Y)
```

Keep only the matching data from `X` and `Y`

]

.pull-right[

This creates:

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["C"],"name":[3],"type":["int"],"align":["right"]},{"label":["D"],"name":[4],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"1","4":"3"},{"1":"b","2":"u","3":"2","4":"2"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

]

---
background-image: url(img/inner-join-01.png)
background-position: 50% 75%
background-size: 50%
class: left, top

# `dplyr` inner joins (1)

### Inner joins use only the matched data from both the `X` and `Y` tables

---
background-image: url(img/inner-join-02.png)
background-position: 50% 60%
background-size: 70%
class: left, top

# `dplyr` inner joins (2)

### Columns `A` and `B` are matched in both `X` and `Y` tables

---
background-image: url(img/inner-join-03.png)
background-position: 50% 60%
background-size: 85%
class: left, top

# `dplyr` inner joins (3)

### Column `C` from table `X` gets joined on matching columns `A` and `B`

---
background-image: url(img/inner-join-04.png)
background-position: 50% 60%
background-size: 90%
class: left, top

# `dplyr` inner joins (4)

### Column `D` from table `Y` gets joined on matching columns `A` and `B`

---
background-image: url(img/dplyr.png)
background-position: 90% 10%
background-size: 12%
class: left, top

# `dplyr` full joins

## `full_join(x = X, y = Y)`

## ...keep *all* data in both `x` and `y`

.pull-left[

```r
full_join(x = X, y = Y)
```

Keep all data from `Y` and `X`

]

.pull-right[

This creates:

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["C"],"name":[3],"type":["int"],"align":["right"]},{"label":["D"],"name":[4],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"1","4":"3"},{"1":"b","2":"u","3":"2","4":"2"},{"1":"c","2":"v","3":"3","4":"NA"},{"1":"d","2":"W","3":"NA","4":"1"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

]

---
background-image: url(img/full-join-01.png)
background-position: 50% 60%
background-size: 50%
class: left, top

# `dplyr` full joins (1)

### Full joins include all data from both tables `X` and `Y`

---
background-image: url(img/full-join-02.png)
background-position: 50% 60%
background-size: 70%
class: left, top

# `dplyr` full joins (2)

### Full joins start with all data in table `X`

---
background-image: url(img/full-join-03.png)
background-position: 50% 70%
background-size: 95%
class: left, top

# `dplyr` full joins (3)

### Full joins start with all data in table `X` and include the columns and rows from table `Y`

---
class: left, top
background-image: url(img/dplyr.png)
background-position: 10% 95%
background-size: 8%

# Resources for Data Tidying

### 1. [R for Data Science](https://r4ds.had.co.nz/tidy-data.html)

### 2. [Data Wrangling with R](https://cengel.github.io/R-data-wrangling/)

### 3. [Stack Overflow questions tagged with `tidyr`](https://stackoverflow.com/questions/tagged/tidyr)

### 4. [RStudio Community posts tagged `tidyr`](https://community.rstudio.com/tag/tidyr)