layout: true <!-- this adds the link footer to all slides, depends on footer-small class in css--> <div class="footer-small"><span>https://github.com/mjfrigaard/csuc-data-journalism</div> --- name: title-slide class: title-slide, center, middle, inverse # Data Manipulation with R #.fancy[An introduction to the `dplyr` package] <br> .large[by Martin Frigaard] Written: September 21 2021 Updated: November 30 2021 .footer-large[.right[.fira[ <br><br><br><br><br>[Created using the "λέξις" theme](https://jhelvy.github.io/lexis/index.html#what-does-%CE%BB%CE%AD%CE%BE%CE%B9%CF%82-mean) ]]] --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 10% background-size: 6% # Objectives ## 1) Common data manipulation tasks ## 2) `dplyr`'s verbs ## 3) the pipe `%>%` --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 10% background-size: 6% # Materials ### Follow along with the exercises: https://mjfrigaard.github.io/csuc-data-journalism/lessons.html ### A web version of these slides is located: https://mjfrigaard.github.io/csuc-data-journalism/slides.html --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 10% background-size: 6% # What are common data manipulation tasks? .leftcol[ ### 1. Viewing the dataset ### 2. Choosing columns/rows ### 3. Ordering rows ] -- .rightcol[ ### 4. Changing existing columns ### 5. Creating or calculating new columns ] --- class: inverse, center, bottom background-image: url(img/dplyr.png) background-position: 50% 10% background-size: 35% # `dplyr` = a grammar for data manipulation --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 10% background-size: 6% # `dplyr` = "dee + ply + ARRRR" .leftcol[ #### *Pliers* are tools for grasping or manipulating common objects .border[ <img src="img/pliers.png" width="55%" height="55%" style="display: block; margin: auto;" /> ] ] -- .rightcol[ #### The `dplyr` package has a variety of verbs for performing common data manipulations ] --- class: left, top background-image: url(img/dplyr.png) background-position: 90% 10% background-size: 6% # The `starwars` dataset ### These data come from the Star Wars API: <img src="img/star-wars.jpeg" width="40%" height="40%" style="display: block; margin: auto;" /> -- #### Read more about the data here: https://dplyr.tidyverse.org/reference/starwars.html --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 9% background-size: 6% # Load the `starwars` dataset ### The `starwars` data comes from the `dplyr` package, so we can access it using the code below: ```r install.packages("dplyr") library(dplyr) dplyr::starwars ``` -- ### We'll use a smaller version of this dataset (`original_starwars`) to show `dplyr`'s common data manipluation verbs --- class: left, top background-image: url(img/dplyr.png) background-position: 93% 7% background-size: 7% # Import `original_starwars` -- #### Import the data using the url and `readr` ```r library(readr) original_starwars <- read_csv("https://bit.ly/mini-strwrs") ``` -- #### This loads the dataset into our ***Environment*** pane .border[ <img src="img/starwars-env.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] --- class: left, top background-image: url(img/dplyr.png) background-position: 90% 10% background-size: 6% # `dplyr` verbs ### The primary verbs for data manipulation in `dplyr`: #### `glimpse()` #### `select()` #### `filter()` #### `arrange()` #### `mutate()` --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 10% background-size: 25% <br><br><br><br><br><br><br><br> # Viewing the data = `glimpse()` ### *We need to view the data we're manipulating to see if the changes are correct* --- class: left, top background-image: url(img/dplyr.png) background-position: 93% 7% background-size: 7% # View = `glimpse()` #### Take a look at the entire dataset using `dplyr::glimpse()` -- ```r glimpse(original_starwars) ``` ``` Rows: 6 Columns: 6 $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Leia Organa", "Chewbac… $ height <dbl> 172, 167, 96, 150, 228, 180 $ mass <dbl> 77, 75, 32, 49, 112, 80 $ hair_color <chr> "blond", NA, NA, "brown", "brown", "brown" $ species <chr> "Human", "Droid", "Droid", "Human", "Wookiee", "Human" $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Alderaan", "Kashyyyk", "C… ``` #### `glimpse()` transposes the data and prints as much of it to the screen as possible --- class: left, top background-image: url(img/dplyr.png) background-position: 93% 7% background-size: 7% ## View the data in the ***Console*** *Enter the name of the dataset to print it to the Console* ```r original_starwars ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["height"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mass"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["hair_color"],"name":[4],"type":["chr"],"align":["left"]},{"label":["species"],"name":[5],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[6],"type":["chr"],"align":["left"]}],"data":[{"1":"Luke Skywalker","2":"172","3":"77","4":"blond","5":"Human","6":"Tatooine"},{"1":"C-3PO","2":"167","3":"75","4":"NA","5":"Droid","6":"Tatooine"},{"1":"R2-D2","2":"96","3":"32","4":"NA","5":"Droid","6":"Naboo"},{"1":"Leia Organa","2":"150","3":"49","4":"brown","5":"Human","6":"Alderaan"},{"1":"Chewbacca","2":"228","3":"112","4":"brown","5":"Wookiee","6":"Kashyyyk"},{"1":"Han Solo","2":"180","3":"80","4":"brown","5":"Human","6":"Corellia"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: left, top background-image: url(img/dplyr.png) background-position: 93% 7% background-size: 7% ## View the data in the *Data Viewer* #### View the `original_starwars` dataset using RStudio's data editor <img src="img/starwars-dataviewer.png" width="50%" height="50%" style="display: block; margin: auto;" /> --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 10% background-size: 25% <br><br><br><br><br><br> # Choosing columns = `select()` --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 15% background-size: 6% ## Choose columns with `select()` #### `select()` allows us to pick specific columns out of a dataset ```r select(original_starwars, name, homeworld, species) ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[2],"type":["chr"],"align":["left"]},{"label":["species"],"name":[3],"type":["chr"],"align":["left"]}],"data":[{"1":"Luke Skywalker","2":"Tatooine","3":"Human"},{"1":"C-3PO","2":"Tatooine","3":"Droid"},{"1":"R2-D2","2":"Naboo","3":"Droid"},{"1":"Leia Organa","2":"Alderaan","3":"Human"},{"1":"Chewbacca","2":"Kashyyyk","3":"Wookiee"},{"1":"Han Solo","2":"Corellia","3":"Human"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 15% background-size: 6% ## Choose columns with `select()` #### We can use negation (`-`) to remove columns ```r select(original_starwars, -c(mass, height, hair_color)) ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["species"],"name":[2],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[3],"type":["chr"],"align":["left"]}],"data":[{"1":"Luke Skywalker","2":"Human","3":"Tatooine"},{"1":"C-3PO","2":"Droid","3":"Tatooine"},{"1":"R2-D2","2":"Droid","3":"Naboo"},{"1":"Leia Organa","2":"Human","3":"Alderaan"},{"1":"Chewbacca","2":"Wookiee","3":"Kashyyyk"},{"1":"Han Solo","2":"Human","3":"Corellia"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 9% background-size: 6% # `select()` helpers #### `select()` comes with 'helpers' to make chosing columns easier (and reduces typing!) Helper | Outputs :------------------- | :-------------------------------------- `starts_with()` | choose columns starting with... `ends_with()` | choose columns ending with... `contains` | choose columns with names containing... `matches()` | choose columns matching regex... `one_of()` | choose columns from a set of names... `num_range()` | choose columns from a numerical index... --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 15% background-size: 6% ## Choose columns with `select()` ### Select columns using `matches()` ```r select(original_starwars, name, matches("_")) ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["hair_color"],"name":[2],"type":["chr"],"align":["left"]}],"data":[{"1":"Luke Skywalker","2":"blond"},{"1":"C-3PO","2":"NA"},{"1":"R2-D2","2":"NA"},{"1":"Leia Organa","2":"brown"},{"1":"Chewbacca","2":"brown"},{"1":"Han Solo","2":"brown"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 15% background-size: 20% <br><br><br><br><br><br> # See the `select()` exercises for more examples! --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 10% background-size: 25% <br><br><br><br><br><br> # Choosing rows with `filter()` --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 7% background-size: 7% # Choose rows with `filter()` ### `filter()` lets us pull out rows based on logical conditions ```r filter(original_starwars, species == "Human") ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["height"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mass"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["hair_color"],"name":[4],"type":["chr"],"align":["left"]},{"label":["species"],"name":[5],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[6],"type":["chr"],"align":["left"]}],"data":[{"1":"Luke Skywalker","2":"172","3":"77","4":"blond","5":"Human","6":"Tatooine"},{"1":"Leia Organa","2":"150","3":"49","4":"brown","5":"Human","6":"Alderaan"},{"1":"Han Solo","2":"180","3":"80","4":"brown","5":"Human","6":"Corellia"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: left top background-image: url(img/dplyr.png) background-position: 95% 7% background-size: 6% # Choose rows with `filter()` #### `filter()` logical conditions include: Logical Test | Outputs :--------------: | :--------------------------: `<` | Less than `>` | Greater than `==` | Equal to `<=` | Less than or equal to `>=` | Greater than or equal to `!=` | Not equal to `%in%` | Group membership `is.na()` | is NA (missing) `!is.na()` | is not NA (non-missing) --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 7% background-size: 6% # Choose rows with `filter()` ### Combine logical condtions with `&` or `,` *this gets the same results...* ```r filter(original_starwars, species == "Human" & !is.na(hair_color)) ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["height"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mass"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["hair_color"],"name":[4],"type":["chr"],"align":["left"]},{"label":["species"],"name":[5],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[6],"type":["chr"],"align":["left"]}],"data":[{"1":"Luke Skywalker","2":"172","3":"77","4":"blond","5":"Human","6":"Tatooine"},{"1":"Leia Organa","2":"150","3":"49","4":"brown","5":"Human","6":"Alderaan"},{"1":"Han Solo","2":"180","3":"80","4":"brown","5":"Human","6":"Corellia"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 7% background-size: 6% # Choose rows with `filter()` ### Combine logical condtions with `&` or `,` *...as this* ```r filter(original_starwars, species == "Human" , !is.na(hair_color)) ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["height"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mass"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["hair_color"],"name":[4],"type":["chr"],"align":["left"]},{"label":["species"],"name":[5],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[6],"type":["chr"],"align":["left"]}],"data":[{"1":"Luke Skywalker","2":"172","3":"77","4":"blond","5":"Human","6":"Tatooine"},{"1":"Leia Organa","2":"150","3":"49","4":"brown","5":"Human","6":"Alderaan"},{"1":"Han Solo","2":"180","3":"80","4":"brown","5":"Human","6":"Corellia"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 7% background-size: 6% # Choose rows with `filter()` #### Remember that *any* logical condition works for `filter()`ing, so we can borrow functions from other packages to help us (like `stringr::str_detect()`) -- ```r filter(original_starwars, str_detect(string = name, pattern = "[:digit:]")) ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["height"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mass"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["hair_color"],"name":[4],"type":["chr"],"align":["left"]},{"label":["species"],"name":[5],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[6],"type":["chr"],"align":["left"]}],"data":[{"1":"C-3PO","2":"167","3":"75","4":"NA","5":"Droid","6":"Tatooine"},{"1":"R2-D2","2":"96","3":"32","4":"NA","5":"Droid","6":"Naboo"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 15% background-size: 20% <br><br><br><br><br><br><br> # See the `filter()` exercises for more examples! --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 10% background-size: 25% <br><br><br><br><br><br> # Sorting rows with `arrange()` --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 7% background-size: 6% # Sort rows with `arrange()` ### `arrange()` sorts the contents of a dataset (ascending or descending) ```r arrange(original_starwars, height) ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["height"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mass"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["hair_color"],"name":[4],"type":["chr"],"align":["left"]},{"label":["species"],"name":[5],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[6],"type":["chr"],"align":["left"]}],"data":[{"1":"R2-D2","2":"96","3":"32","4":"NA","5":"Droid","6":"Naboo"},{"1":"Leia Organa","2":"150","3":"49","4":"brown","5":"Human","6":"Alderaan"},{"1":"C-3PO","2":"167","3":"75","4":"NA","5":"Droid","6":"Tatooine"},{"1":"Luke Skywalker","2":"172","3":"77","4":"blond","5":"Human","6":"Tatooine"},{"1":"Han Solo","2":"180","3":"80","4":"brown","5":"Human","6":"Corellia"},{"1":"Chewbacca","2":"228","3":"112","4":"brown","5":"Wookiee","6":"Kashyyyk"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: left, top background-image: url(img/dplyr.png) background-position: 95% 7% background-size: 6% # Sort rows with `arrange()` ### `arrange()`'s default is to sort ascending--include `desc()` to sort descending ```r arrange(original_starwars, desc(height)) ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["height"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mass"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["hair_color"],"name":[4],"type":["chr"],"align":["left"]},{"label":["species"],"name":[5],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[6],"type":["chr"],"align":["left"]}],"data":[{"1":"Chewbacca","2":"228","3":"112","4":"brown","5":"Wookiee","6":"Kashyyyk"},{"1":"Han Solo","2":"180","3":"80","4":"brown","5":"Human","6":"Corellia"},{"1":"Luke Skywalker","2":"172","3":"77","4":"blond","5":"Human","6":"Tatooine"},{"1":"C-3PO","2":"167","3":"75","4":"NA","5":"Droid","6":"Tatooine"},{"1":"Leia Organa","2":"150","3":"49","4":"brown","5":"Human","6":"Alderaan"},{"1":"R2-D2","2":"96","3":"32","4":"NA","5":"Droid","6":"Naboo"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 15% background-size: 20% <br><br><br><br><br><br><br> # See the `arrange()` exercises for more examples! --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 10% background-size: 25% <br><br><br><br><br><br> # Creating columns with `mutate()` --- class: left, top background-image: url(img/dplyr.png) background-position: 93% 7% background-size: 6% # Create columns with `mutate()` ```r mutate(original_starwars, # create new bmi variable bmi = mass / ((height / 100) ^ 2)) ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["height"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mass"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["hair_color"],"name":[4],"type":["chr"],"align":["left"]},{"label":["species"],"name":[5],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[6],"type":["chr"],"align":["left"]},{"label":["bmi"],"name":[7],"type":["dbl"],"align":["right"]}],"data":[{"1":"Luke Skywalker","2":"172","3":"77","4":"blond","5":"Human","6":"Tatooine","7":"26.02758"},{"1":"C-3PO","2":"167","3":"75","4":"NA","5":"Droid","6":"Tatooine","7":"26.89232"},{"1":"R2-D2","2":"96","3":"32","4":"NA","5":"Droid","6":"Naboo","7":"34.72222"},{"1":"Leia Organa","2":"150","3":"49","4":"brown","5":"Human","6":"Alderaan","7":"21.77778"},{"1":"Chewbacca","2":"228","3":"112","4":"brown","5":"Wookiee","6":"Kashyyyk","7":"21.54509"},{"1":"Han Solo","2":"180","3":"80","4":"brown","5":"Human","6":"Corellia","7":"24.69136"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: left, top background-image: url(img/dplyr.png) background-position: 93% 7% background-size: 6% # Change existing columns with `mutate()` ```r mutate(original_starwars, # create bmi bmi = mass / ((height / 100) ^ 2), # change bmi bmi = round(bmi, digits = 0)) ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["height"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mass"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["hair_color"],"name":[4],"type":["chr"],"align":["left"]},{"label":["species"],"name":[5],"type":["chr"],"align":["left"]},{"label":["homeworld"],"name":[6],"type":["chr"],"align":["left"]},{"label":["bmi"],"name":[7],"type":["dbl"],"align":["right"]}],"data":[{"1":"Luke Skywalker","2":"172","3":"77","4":"blond","5":"Human","6":"Tatooine","7":"26"},{"1":"C-3PO","2":"167","3":"75","4":"NA","5":"Droid","6":"Tatooine","7":"27"},{"1":"R2-D2","2":"96","3":"32","4":"NA","5":"Droid","6":"Naboo","7":"35"},{"1":"Leia Organa","2":"150","3":"49","4":"brown","5":"Human","6":"Alderaan","7":"22"},{"1":"Chewbacca","2":"228","3":"112","4":"brown","5":"Wookiee","6":"Kashyyyk","7":"22"},{"1":"Han Solo","2":"180","3":"80","4":"brown","5":"Human","6":"Corellia","7":"25"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 15% background-size: 20% <br><br><br><br><br><br> # See the `mutate()` exercises for more examples! --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 15% background-size: 20% <br><br><br><br><br><br> # Write clearer code with the pipe `%>%` --- class: left, top background-image: url(img/dplyr.png) background-position: 93% 10% background-size: 6% # The pipe (`%>%`) ### The pipe comes from the `magrittr` package: https://magrittr.tidyverse.org/ -- ### The pipe makes our code easier to read (and write) ### Create pipes easily with keyboard shortcuts -- .leftcol[.center[ ***Mac*** #### `Cmd` + `Shift` + `M` ]] -- .rightcol[.center[ ***Windows*** #### `Crtl` + `Shift` + `M` ]] --- class: left, top background-image: url(img/dplyr.png) background-position: 93% 10% background-size: 6% # How the pipe (`%>%`) works -- ### Without the pipe, we have to constantly assign the output to new object: -- <img src="img/pipe-00.png" width="70%" height="70%" style="display: block; margin: auto;" /> -- ### Or use nested functions: -- <img src="img/pipe-01.png" width="90%" height="90%" style="display: block; margin: auto;" /> --- class: left, top background-image: url(img/dplyr.png) background-position: 93% 10% background-size: 6% # How the pipe (`%>%`) works ### The pipe allows us to pass the output from functions left-to-right -- <img src="img/pipe-03.png" width="48%" height="48%" style="display: block; margin: auto;" /> .center[ ***`%>%` can be read as "then"*** ] --- class: left, top background-image: url(img/dplyr.png) background-position: 97% 8% background-size: 6% # Creating pipelines of functions ### Review the code below and think about what each object contains: 1. Filter `original_starwars` to only brown-haired characters over 100 cm tall 2. Create a `bmi` column using: `mass / ((height / 100) ^ 2)` 3. Select `name`, `bmi`, and `homeworld` 4. Arrange the data by `bmi`, descending ```r object_01 <- filter(original_starwars, hair_color == "brown" & height > 100) object_02 <- mutate(object_01, bmi = mass / ((height / 100) ^ 2)) object_03 <- select(object_02, name, bmi, homeworld) object_04 <- arrange(object_03, desc(bmi)) ``` --- class: left, top background-image: url(img/dplyr.png) background-position: 97% 8% background-size: 6% # Creating pipelines of functions ### Re-write these functions into a pipeline, ending with a single output (`new_original_starwars`) 1. Filter `original_starwars` to only brown-haired characters over 100 cm tall 2. Create a `bmi` column using: `mass / ((height / 100) ^ 2)` 3. Select `name`, `bmi`, and `homeworld` 4. Arrange the data by `bmi`, descending -- ```r original_starwars %>% filter(hair_color == "_____" & height > ___) %>% mutate(___ = mass / ((height / 100) ^ 2)) %>% select(____, bmi, _________) %>% arrange(____(bmi)) -> new_original_starwars ``` --- class: left, top background-image: url(img/dplyr.png) background-position: 97% 8% background-size: 6% # Creating pipelines of functions The answer is below: ```r original_starwars %>% filter(hair_color == "brown" & height > 100) %>% mutate(bmi = mass / ((height / 100) ^ 2)) %>% select(name, bmi, homeworld) %>% arrange(desc(bmi)) -> new_original_starwars new_original_starwars ``` -- <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["bmi"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["homeworld"],"name":[3],"type":["chr"],"align":["left"]}],"data":[{"1":"Han Solo","2":"24.69136","3":"Corellia"},{"1":"Leia Organa","2":"21.77778","3":"Alderaan"},{"1":"Chewbacca","2":"21.54509","3":"Kashyyyk"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: inverse, center, middle background-image: url(img/dplyr.png) background-position: 50% 15% background-size: 20% <br><br><br><br><br><br> # See the `pipe` exercises for more examples! --- class: left, top background-image: url(img/dplyr.png) background-position: 10% 95% background-size: 6% # Resources for Data Manipluation ### 1. [R for Data Science](https://r4ds.had.co.nz/transform.html) ### 2. [Data Wrangling with R](https://cengel.github.io/R-data-wrangling/) ### 3. [Stack Overflow questions tagged with `dplyr`](https://stackoverflow.com/questions/tagged/dplyr) ### 4. [RStudio Community posts tagged `dplyr`](https://community.rstudio.com/tag/dplyr)