class: center, middle, inverse, title-slide # Data Reshaping in R ## bmRn CSM: Reshaping and joining data in R ### Martin Frigaard ### 2020-12-13 --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 7% background-size: 9% # Objectives ## 1) Recap `dplyr`'s data manipulation verbs ## 2) `tidyr`'s pivoting functions ## 3) Joins with `dplyr` --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 7% background-size: 9% # Materials ### Follow along with the exercises: https://mjfrigaard.github.io/data-trans-joins-exercises/Index.html ### A web version of these slides is located: https://mjfrigaard.github.io/data-transformations-joins/Index.html#1 ### The RStudio.Cloud project: https://rstudio.cloud/project/1941654 --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 7% background-size: 9% # Data Manipulation Recap ## We previously learned how to: -- ### 1) View data with `glimpse()` -- ### 2) Select columns with `select()` -- ### 3) Filter rows with `filter()` -- ### 4) Arrange data with `arrange()` -- ### 5) Create/change columns with `mutate()` --- class: inverse, center, bottom background-image: url(img/dplyr.png) background-position: 50% 10% background-size: 35% # dplyr = a package for *manipulating* data --- class: inverse, center, bottom background-image: url(img/tidyr.png) background-position: 50% 10% background-size: 35% # tidyr = a package for *reshaping* data --- class: left, top background-image: url(img/tidyr.png) background-position: 90% 10% background-size: 12% # Tidy data <br> ## What are tidy data? -- .pull-left[ ### Observations are in rows ### Variables are in columns ### Values are in cells ] -- .pull-right[ <img src="img/tidy-data.png" width="100%" height="100%" style="display: block; margin: auto;" /> ] --- class: left, top background-image: url(img/tidyr.png) background-position: 90% 10% background-size: 12% # Tidy data <br> ### Values are the *intersection* of observations and variables -- <img src="img/intersection.png" width="60%" height="60%" style="display: block; margin: auto;" /> --- class: left, top background-image: url(img/tidyr.png) background-position: 90% 10% background-size: 8% # Non-tidy data ### Copy and paste the code below to create `NotTidy` ```r # copy and paste me! NotTidy <- tibble::tribble( ~group, ~`2019`, ~`2020`, ~`2021`, "A", "102/100", "123/100", "161/100", "B", "179/100", "199/100", "221/100", "C", "223/100", "146/100", "288/100") ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["group"],"name":[1],"type":["chr"],"align":["left"]},{"label":["2019"],"name":[2],"type":["chr"],"align":["left"]},{"label":["2020"],"name":[3],"type":["chr"],"align":["left"]},{"label":["2021"],"name":[4],"type":["chr"],"align":["left"]}],"data":[{"1":"A","2":"102/100","3":"123/100","4":"161/100"},{"1":"B","2":"179/100","3":"199/100","4":"221/100"},{"1":"C","2":"223/100","3":"146/100","4":"288/100"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: left, top background-image: url(img/tidyr.png) background-position: 90% 10% background-size: 12% # Non-tidy data ## Why aren't they tidy? ```r NotTidy %>% View("NotTidy") ``` -- <img src="img/NotTidy.png" width="60%" height="60%" style="display: block; margin: auto;" /> -- > `year` is across columns... > multiple values in cells... --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 8% background-size: 8% ## Tidying un-tidy data *** -- *covered in slides* #### `pivot_longer()` - wide to long #### `pivot_wider()` - long to wide *** -- *covered in exercises* .pull-left[ `separate()` - pull columns apart `separate_rows()` - split columns down rows `unite()` - stick columns together ] .pull-right[ `unnest()` - flatten columns `uncount()` - duplicate rows according to a weighting variable ] --- class: left, top background-image: url(img/tidyr.png) background-position: 90% 10% background-size: 9% # Quick tip: viewing your data <br><br><br> ### Make sure you view the data before assigning it to an object <br> ### Use `glimpse()` or `View("Name")` --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 8% background-size: 9% ## `tidyr::pivot_longer()` .pull-left[ Make wide data long ```r NotTidy %>% pivot_longer( cols = -group, names_to = "year", values_to = "rate") %>% View("Tidy") ``` ] -- .pull-right[ How it `pivot_longer()` works <img src="img/NotTidy-2-Tidy.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 8% background-size: 9% ## `tidyr::pivot_longer()` #### How it works: `cols` = these are the **columns we want to reshape** ```r NotTidy %>% # we include by omission * pivot_longer(cols = -group, ``` -- `names_to` = this is the new variable that will contain the previous **column names** ```r NotTidy %>% pivot_longer(cols = -group, # new column * names_to = "year", ``` -- `values_to` = this is the new variable that will contain the reshaped **values** ```r NotTidy %>% pivot_longer(cols = -group, names_to = "year", # column for values * values_to = "rate") ``` --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 8% background-size: 9% ## `tidyr::pivot_longer()` .pull-left[ ### Looks correct? ```r NotTidy %>% pivot_longer(cols = -group, names_to = "year", values_to = "rate") %>% View("Tidy") ``` ] -- .pull-right[ ### Not quite <img src="img/TidyView.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 95% background-size: 9% ## `pivot_longer()` = names_transform ### Print `Tidy` to console .pull-left[ ```r Tidy ``` ``` ## # A tibble: 9 x 3 ## group year rate *## <chr> <chr> <chr> ## 1 A 2019 102/100 ## 2 A 2020 123/100 ## 3 A 2021 161/100 ## 4 B 2019 179/100 ## 5 B 2020 199/100 ## 6 B 2021 221/100 ## 7 C 2019 223/100 ## 8 C 2020 146/100 ## 9 C 2021 288/100 ``` ] -- .pull-right[ <br><br> ### *Note the format of the columns* ] --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 95% background-size: 9% ## `pivot_longer()` = names_transform ### The new `year` variable should be numeric #### We can control this behavior with `names_transform = list()` -- .pull-left[ ```r NotTidy %>% pivot_longer( cols = -group, names_to = "year", values_to = "rate", * names_transform = list( * year = as.numeric)) ``` ] -- .pull-right[ ``` ## # A tibble: 9 x 3 ## group year rate *## <chr> <dbl> <chr> ## 1 A 2019 102/100 ## 2 A 2020 123/100 ## 3 A 2021 161/100 ## 4 B 2019 179/100 ## 5 B 2020 199/100 ## 6 B 2021 221/100 ## 7 C 2019 223/100 ## 8 C 2020 146/100 ## 9 C 2021 288/100 ``` ] --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 8% background-size: 9% ## `pivot_longer()` = names_sep Create the `SiteRates` data ```r SiteRates <- tibble::tribble( ~site, ~`2019_Q1`, ~`2019_Q2`, ~`2019_Q3`, ~`2019_Q4`, "Boston", 52, 31, 26, 33.4, "Philadelphia", 7.42, 5.51, 5.82, 6.99, "Cincinnati ", 6.73, 4.87, 5.02, 4.66, "Texas", 18.2, 16.6, 17, 19) SiteRates %>% View("SiteRates") ``` -- <img src="img/SiteRates.png" width="85%" height="85%" style="display: block; margin: auto;" /> --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## `pivot_longer()` = names_sep ### `SiteRates` has *two* variables in the same column <img src="img/SiteRatesView.png" width="85%" height="85%" style="display: block; margin: auto;" /> #### We can use `names_sep ` uses a pattern to split the column (`_Q`) --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## `pivot_longer()` = names_sep #### Add `names_sep = "_Q"` #### `year` and `quarter` should also be numeric -- .pull-left[ ```r SiteRates %>% pivot_longer( -site, names_to = c("year", "quarter"), values_to = "rate", * names_sep = "_Q", names_transform = list( * year = as.integer, * quarter = as.integer)) %>% View("TidySites") ``` ] -- .pull-right[ <img src="img/TidySites.png" width="69%" height="69%" style="display: block; margin: auto;" /> ] --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## More `pivot_longer()` options Create the `SpaceDogs` data. These are names of dogs in the [Soviet Space Dogs database](https://airtable.com/universe/expG3z2CFykG1dZsp/sovet-space-dogs). *Some dogs have multiple names, and some share names.* ```r # data.frame prints to the screen better SpaceDogs <- data.frame( date = c("1966-02-22", "1961-03-25", "1961-03-09", "1960-12-22", "1960-12-01", "1960-09-22"), result = c("recovered safely", "recovered safely", "recovered safely", "recovered", "both dogs died","recovered safely"), name_1 = c("Ugolyok / Snezhok", "Zvezdochka", "Chernuskha", "Shutka", "Mushka", "Kusachka / Otvazhnaya"), name_2 = c("Veterok / Bzdunok", NA, NA, "Kometka", "Pchyolka", "Neva")) ``` --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## More `pivot_longer()` options ```r SpaceDogs %>% View("SpaceDogs") ``` <img src="img/SpaceDogs.png" width="85%" height="85%" style="display: block; margin: auto;" /> -- The `SpaceDogs` data has missing values in `name_2` *...two of the columns have a similar prefix (`name_`)* --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## More `pivot_longer()` options ### To tidy these data, we can combine what we've learned: #### 1. `dplyr::starts_with()` to select the `name_` columns -- #### 2. `names_to` has a special `".value"` argument, which is the name of the new column with the `name_` values. We need to include an index column to track the values across different names (`dog_id`). -- #### 3. We can remove missing values with `values_drop_na` --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## More `pivot_longer()` options ```r SpaceDogs %>% pivot_longer( # columns with `name_` prefix * starts_with("name_"), ``` -- ```r SpaceDogs %>% pivot_longer( starts_with("name_"), names_sep = "_", names_to = # new column for `name_` values * c(".value", "dog_id"), ``` -- ```r SpaceDogs %>% pivot_longer( starts_with("name_"), names_sep = "_", names_to = c(".value", "dog_id"), # remove missing * values_drop_na = TRUE) ``` --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## More `pivot_longer()` options ### Add these arguments to `pivot_longer()` and add `View()` .pull-left[ ```r SpaceDogs %>% pivot_longer( * starts_with("name_"), names_sep = "_", names_to = c(".value", "dog_id"), * values_drop_na = TRUE) %>% View("TidySpaceDogs") ``` ] -- .pull-right[ <img src="img/TidySpaceDogs.png" width="100%" height="100%" style="display: block; margin: auto;" /> ] --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## More `pivot_longer()` options <br> <img src="img/SpaceDogs-2-TidySpaceDogs.png" width="100%" height="100%" style="display: block; margin: auto;" /> -- ### We can see the new `dog_id` index ### The missing values have been removed --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% # `tidyr::pivot_wider()` <br> ## We've made wide data long, now we will make long data wide ### *But, why???* ### Wide data is usually better for displaying summaries ### Long data is better for graphing/modeling --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% # `tidyr::pivot_wider()` <br> ### Consider the `CatVsDogWide` data <br> .pull-left[ #### These data come from this article in the [Washington Post](https://www.washingtonpost.com/news/wonk/wp/2014/07/28/where-cats-are-more-popular-than-dogs-in-the-u-s-and-all-over-the-world/?utm_term=.b50cb49b78b5). ] .pull-right[ <img src="img/wapo-article.png" width="100%" height="100%" style="display: block; margin: auto;" /> ] --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## `tidyr::pivot_wider()` **Calculate percent pets per household, dog owners, and cat owners** ```r CatVsDogWide <- tibble::tribble( ~metric, ~CA, ~TX, ~FL, ~NY, ~PA, "no_of_households", 12974, 9002, 7609, 7512, 5172, "no_of_pet_households", 6865, 5265, 4138, 3802, 2942, "no_of_dog_households", 4260, 3960, 2718, 2177, 1702, "no_of_cat_households", 3687, 2544, 2079, 2189, 1748) CatVsDogWide %>% View("CvDWide") ``` -- <img src="img/CvDWide.png" width="88%" height="88%" style="display: block; margin: auto;" /> --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## `tidyr::pivot_wider()` **Calculate percent pets per household, dog owners, and cat owners** Step 1. `pivot_long()` states into `state` column, values into `value` column -- .pull-left[ ```r CatVsDogWide %>% pivot_longer(-metric, names_to = "state", values_to = "value") %>% # view View("CvDLong") ``` ] -- .pull-right[ <img src="img/CvDLong.png" width="57%" height="57%" style="display: block; margin: auto;" /> ] --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## `tidyr::pivot_wider()` **Calculate percent pets per household, dog owners, and cat owners** Step 1. `pivot_long()` states into `state` column, values into `value` column -- Step 2. `pivot_wider()` the `metric` across columns -- ```r CatVsDogWide %>% pivot_longer(-metric, names_to = "state", values_to = "value") %>% pivot_wider(names_from = "metric", values_from = "value") %>% View("CvDWide") ``` -- <img src="img/CvDWide2.png" width="100%" height="100%" style="display: block; margin: auto;" /> -- *Now these data are in a format to calculate percentages!* --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## `tidyr::pivot_wider()` **Calculate percent pets per household, dog owners, and cat owners** Step 1. `pivot_long()` states into `state` column, values into `value` column Step 2. `pivot_wider()` the `metric` and `values` across columns -- Step 3. `mutate()` the `perc` (percentage) columns -- ```r CatVsDogWide %>% pivot_longer(-metric, names_to = "state", values_to = "value") %>% pivot_wider(names_from = "metric", values_from = "value") %>% mutate( perc_pet_household = no_of_pet_households / no_of_households * 100, perc_pet_household = round(perc_pet_household, digits = 1), perc_dog_owners = no_of_dog_households / no_of_households * 100, perc_dog_owners = round(perc_dog_owners, digits = 1), perc_cat_owners = no_of_cat_households / no_of_households * 100, perc_cat_owners = round(perc_cat_owners, digits = 1)) %>% View("CvDWide") ``` --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## `tidyr::pivot_wider()` **Calculate percent pets per household, dog owners, and cat owners** Step 1. `pivot_long()` states into `state` column, values into `value` column Step 2. `pivot_wider()` the `metric` and `values` across columns -- Step 3. `mutate()` the `perc` (percentage) columns -- <img src="img/CvDWide2-perc.png" width="100%" height="100%" style="display: block; margin: auto;" /> -- ***We could assign here, but why stop?*** -- Add `select()` helpers to reorganize data! -- ```r select(state, contains("perc_dog"), contains("perc_cat")) ``` --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## `tidyr::pivot_wider()` ***Final code:*** Here is the complete pipeline ```r CatVsDogWide %>% # pivots pivot_longer( -metric, names_to = "state", values_to = "value") %>% pivot_wider( names_from = "metric", values_from = "value") %>% # mutate mutate( perc_pet_household = no_of_pet_households / no_of_households * 100, perc_pet_household = round(perc_pet_household, digits = 1), perc_dog_owners = no_of_dog_households / no_of_households * 100, perc_dog_owners = round(perc_dog_owners, digits = 1), perc_cat_owners = no_of_cat_households / no_of_households * 100, perc_cat_owners = round(perc_cat_owners, digits = 1)) %>% # select select(state, contains("perc_dog"), contains("perc_cat")) ``` --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 5% background-size: 9% ## `tidyr::pivot_wider()` ***Final output:*** Here is the final summary table <img src="img/CvsDPercent.png" width="85%" height="85%" style="display: block; margin: auto;" /> --- class: left, top background-image: url(img/tidyr.png) background-position: 95% 90% background-size: 8% # More tidying in the exercises! ### `separate()` - pull columns apart ### `separate_rows()` - split columns down rows ### `unite()` - stick columns together ### `unnest()` - flatten columns ### `uncount()` - duplicate rows according to a weighting variable --- class: inverse, center, bottom background-image: url(img/dplyr.png) background-position: 50% 20% background-size: 35% # dplyr = a package for *manipulating relational* data --- background-image: url(img/dplyr.png) background-position: 90% 10% background-size: 8% class: left, top # The `dplyr` joining functions ### ***`dplyr` has functions for joining multiple tibbles or data.frames*** -- ### `left_join()` -- ### `right_join()` -- ### `inner_join()` -- ### `full_join()` -- *Recall that tibbles and data.frame's are nearly identical --- background-image: url(img/dplyr.png) background-position: 90% 10% background-size: 12% class: left, top # dplyr joins .pull-left[ ### Toy data `X` ```r # create X table X <- tibble::tribble( ~A, ~B, ~C, "a", "t", 1L, "b", "u", 2L, "c", "v", 3L) ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["C"],"name":[3],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"1"},{"1":"b","2":"u","3":"2"},{"1":"c","2":"v","3":"3"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] -- .pull-right[ ### Toy data `Y` ```r # create Y table Y <- tibble::tribble( ~A, ~B, ~D, "a", "t", 3L, "b", "u", 2L, "d", "W", 1L) ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["D"],"name":[3],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"3"},{"1":"b","2":"u","3":"2"},{"1":"d","2":"W","3":"1"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- background-image: url(img/dplyr.png) background-position: 90% 10% background-size: 12% class: left, top # `dplyr` left joins ### `left_join(x = , y = )` #### ...joins on matches *from* right-hand table (`Y`) *to* left-hand table (`X`) -- .pull-left[ ```r left_join( * x = X, * y = Y ) ``` Keep all data from `X`, and only matching data from `Y` ] -- .pull-right[ This creates: <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["C"],"name":[3],"type":["int"],"align":["right"]},{"label":["D"],"name":[4],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"1","4":"3"},{"1":"b","2":"u","3":"2","4":"2"},{"1":"c","2":"v","3":"3","4":"NA"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- background-image: url(img/left-join-01.png) background-position: 50% 80% background-size: 45% class: left, top # `dplyr` left joins (1) ### Left joins use all the data from `X` (the left-hand table) --- background-image: url(img/left-join-02.png) background-position: 50% 70% background-size: 85% class: left, top # `dplyr` left joins (2) ### Left joins include matched data in the right-hand table `Y`, and it carries over any new corresponding columns --- background-image: url(img/left-join-03.png) background-position: 50% 75% background-size: 90% class: left, top # `dplyr` left join (3) ### The final data includes the new column(s) from `Y` (the right-hand table), and missing values for the unmatched rows. --- background-image: url(img/dplyr.png) background-position: 90% 10% background-size: 12% class: left, top # `dplyr` right joins ### `right_join(x = , y = )` #### ...join on matches *from* right-hand table (`Y`) *to* left-hand table (`X`) -- .pull-left[ ```r right_join(x = X, y = Y) ``` Keep all data from `Y`, and only matching data from `X` ] -- .pull-right[ This creates: <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["C"],"name":[3],"type":["int"],"align":["right"]},{"label":["D"],"name":[4],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"1","4":"3"},{"1":"b","2":"u","3":"2","4":"2"},{"1":"d","2":"W","3":"NA","4":"1"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- background-image: url(img/right-join-01.png) background-position: 50% 75% background-size: 60% class: left, top # `dplyr` right joins (1) ### Right joins use all the data from `Y` (the right-hand table) --- background-image: url(img/right-join-02.png) background-position: 50% 75% background-size: 85% class: left, top # `dplyr` right joins (2) ### Right joins include the matched data in the left-hand table `X`, and they carry over any new corresponding columns --- background-image: url(img/right-join-03.png) background-position: 50% 75% background-size: 90% class: left, top # `dplyr` right joins (3) ### The final data includes the new column(s) from `X` (the left-hand table), and missing values for the unmatched rows. --- background-image: url(img/dplyr.png) background-position: 90% 10% background-size: 12% class: left, top # `dplyr` inner joins ### `inner_join(x = , y = )` #### ...keep only matches in *both* `x` *and* `y` -- .pull-left[ ```r inner_join(x = X, y = Y) ``` Keep only the matching data from `X` and `Y` ] -- .pull-right[ This creates: <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["C"],"name":[3],"type":["int"],"align":["right"]},{"label":["D"],"name":[4],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"1","4":"3"},{"1":"b","2":"u","3":"2","4":"2"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- background-image: url(img/inner-join-01.png) background-position: 50% 75% background-size: 50% class: left, top # `dplyr` inner joins (1) ### Inner joins use only the matched data from both the `X` and `Y` tables --- background-image: url(img/inner-join-02.png) background-position: 50% 60% background-size: 70% class: left, top # `dplyr` inner joins (2) ### Columns `A` and `B` are matched in both `X` and `Y` tables --- background-image: url(img/inner-join-03.png) background-position: 50% 60% background-size: 85% class: left, top # `dplyr` inner joins (3) ### Column `C` from table `X` gets joined on matching columns `A` and `B` --- background-image: url(img/inner-join-04.png) background-position: 50% 60% background-size: 90% class: left, top # `dplyr` inner joins (4) ### Column `D` from table `Y` gets joined on matching columns `A` and `B` --- background-image: url(img/dplyr.png) background-position: 90% 10% background-size: 12% class: left, top # `dplyr` full joins ## `full_join(x = X, y = Y)` ## ...keep *all* data in both `x` and `y` -- .pull-left[ ```r full_join(x = X, y = Y) ``` Keep all data from `Y` and `X` ] -- .pull-right[ This creates: <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["A"],"name":[1],"type":["chr"],"align":["left"]},{"label":["B"],"name":[2],"type":["chr"],"align":["left"]},{"label":["C"],"name":[3],"type":["int"],"align":["right"]},{"label":["D"],"name":[4],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"t","3":"1","4":"3"},{"1":"b","2":"u","3":"2","4":"2"},{"1":"c","2":"v","3":"3","4":"NA"},{"1":"d","2":"W","3":"NA","4":"1"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- background-image: url(img/full-join-01.png) background-position: 50% 60% background-size: 50% class: left, top # `dplyr` full joins (1) ### Full joins include all data from both tables `X` and `Y` --- background-image: url(img/full-join-02.png) background-position: 50% 60% background-size: 70% class: left, top # `dplyr` full joins (2) ### Full joins start with all data in table `X` --- background-image: url(img/full-join-03.png) background-position: 50% 70% background-size: 95% class: left, top # `dplyr` full joins (3) ### Full joins start with all data in table `X` and include the columns and rows from table `Y` --- class: left, top background-image: url(img/dplyr.png) background-position: 10% 95% background-size: 8% # Resources for Data Tidying ### 1. [R for Data Science](https://r4ds.had.co.nz/tidy-data.html) ### 2. [Data Wrangling with R](https://cengel.github.io/R-data-wrangling/) ### 3. [Stack Overflow questions tagged with `tidyr`](https://stackoverflow.com/questions/tagged/tidyr) ### 4. [RStudio Community posts tagged `tidyr`](https://community.rstudio.com/tag/tidyr)