layout: true <!-- this adds the link footer to all slides, depends on footer-small class in css--> <div class="footer-small"><span>https://github.com/mjfrigaard/ph-lacounty-r/</div> --- name: title-slide class: title-slide, center, middle, inverse # Importing Data #.fancy[Getting Data into RStudio] <br> .large[by Martin Frigaard] Written: October 03 2022 Updated: December 02 2022 .footer-large[.right[.fira[ <br><br><br><br><br>[Created using the "λέξις" theme](https://jhelvy.github.io/lexis/index.html#what-does-%CE%BB%CE%AD%CE%BE%CE%B9%CF%82-mean) ]]] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Materials The slides are in the `slides.pdf` file -- The materials for this training are in the `worksheets` folder: ``` worksheets ├── import.Rmd ├── export.Rmd ├── objects.Rmd ├── rmd-basic.Rmd ├── rmd-tables.Rmd └── rmd-visualizations.Rmd ``` --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Outline <br> .leftcol[ #### 1. .red[Importing data] #### 2. Common Data Objects #### 3. R Markdown ] -- .rightcol[ #### 4. R Markdown Data Visualizations #### 5. R Markdown Tables #### 6. Exporting Data ] --- background-image: url(www/pdg-hex.png) class: center, middle, inverse background-position: 96% 4% background-size: 6% # .large[Import Data] -- <br><br> .font90[.green[Open `import.Rmd` to follow along]] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data #### Packages for importing data: | File type | Package | |-------------------------------------|-----------------------| | SAS (`.sas7bdat`) | `haven` | | Excel (`.xlsx`, `.xls`) | `readxl`, `openxlsx` | | Plain Text (`.csv`, `.tsv`, `.txt`) | `readr`, `data.table` | --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (*Environment*) #### The .blue[Environment] Pane <img src="www/rstudio-env.png" width="100%" height="100%" /> --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (*Import Dataset*) .leftcol[ #### Click .blue[Import Dataset] #### Click .blue[From SAS] ] .rightcol[ <img src="www/rstudio-import-dataset.png" width="100%" height="100%" style="display: block; margin: auto auto auto 0;" /> ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (*Required Packages*) #### If you see a prompt to install required packages, click .blue[Yes] <img src="www/rstudio-dep-pkgs.png" width="60%" height="60%" style="display: block; margin: auto;" /> --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (*Dialogue Box*) .leftcol20[ .font80[**You will see the .blue[Import Statistical Data] Dialogue Box**] .font80[**Click .blue[Browse] and navigate to the `data/medical.sas7bdat` file**] ] .rightcol80[ <img src="www/rstudio-import-dialogue-01.png" width="80%" height="80%" style="display: block; margin: auto 0 auto auto;" /> ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (*Dialogue Box*) .leftcol20[ .font80[**You will see the path in .blue[File/URL]**] .font80[**A preview of the data will appear in .blue[Data Preview] **] ] .rightcol80[ <img src="www/rstudio-import-dialogue-02.png" width="70%" height="70%" style="display: block; margin: auto 0 auto auto;" /> <img src="www/rstudio-import-dialogue-03.png" width="90%" height="90%" style="display: block; margin: auto 0 auto auto;" /> ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (*Dialogue Box*) .font90[**You see we have additional .blue[Import Options]**] <img src="www/rstudio-import-dialogue-04.png" width="90%" height="90%" style="display: block; margin: auto 0 auto auto;" /> -- .font90[**We also see a .blue[Code Preview]. Click on the small copy icon, then click .blue[Import]**] <img src="www/rstudio-import-dialogue-05.png" width="90%" height="90%" style="display: block; margin: auto 0 auto auto;" /> --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (*Data Viewer*) .font90[**RStudio imports the data and opens it in the .blue[Data Viewer]**] <img src="www/rstudio-import-dialogue-06.png" width="100%" height="100%" style="display: block; margin: auto 0 auto auto;" /> --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (*Data Viewer*) .font90[**We can also see `medical` has been added to our .blue[Environment] pane**] <img src="www/rstudio-import-dialogue-07.png" width="95%" height="95%" style="display: block; margin: auto 0 auto auto;" /> --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data -- <br> ### .font120[Is what we did reproducible?] -- <br> .font120[***.red[No, but it can be!]***] -- <br> .font120[Open `import.Rmd` from the `worksheets` folder] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data .leftcol20[ In `Import.Rmd` - Instructions inside `#` boxes won't run - Fill in `author` and `date` (inside quotes) ] .rightcol80[ <img src="www/import-instructions.png" width="75%" height="75%" style="display: block; margin: auto 0 auto auto;" /> ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (from local) We already have the code to import `medical.sas7bdat` from local <img src="www/rstudio-import-dialogue-05.png" width="90%" height="90%" style="display: block; margin: auto 0 auto auto;" /> -- We need to adjust the file path to `../data/medical.sas7bdat` .leftcol[ .code70[ ``` . # importing with dialogue └── data/ └── medical.sas7bdat ``` ] ] .rightcol[ .code70[ ``` . # importing from file ├── data/ │ └── medical.sas7bdat └── worksheets/ └── import.Rmd ``` ] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (download and import) <br> We can also download the file from a `url` .code70[ ```r download.file( * url = "http://www.principlesofeconometrics.com/sas/medical.sas7bdat", ) ``` ] -- And save this to a local `destfile` .code70[ ```r download.file( url = "http://www.principlesofeconometrics.com/sas/medical.sas7bdat", * destfile = "../data/downloads/medical.sas7bdat") ``` ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (download and import) <br> .font120[Now we can import the file from our `downloads/` folder] .leftcol35[ .code60[ ``` . # importing from downloads folder ├── data/ │ ├── medical.sas7bdat │ └── downloads/ │ └── medical.sas7bdat └── worksheets/ └── import.Rmd ``` ] ] .rightcol65[ <br> .code70[ ```r medical <- read_sas("../data/downloads/medical.sas7bdat") ``` ] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (parameters) .font90[For a more permanent solution, we can use parameters in our R Markdown file to store file location (or other metadata)] .code70[ ```yaml title: "May Report" author: "Joe Smith" date: "2022-11-30" output: html_document params: sas_data_url: !r file.path("http://www.principlesofeconometrics.com/sas/medical.sas7bdat") sas_data_dir: !r c("../data/sas/") ``` ] -- .code70[ ```r *download.file(url = params$sas_data_url, ) ``` ] -- .code70[ ```r download.file(url = params$sas_data_url, * destfile = params$sas_data_dir) ``` ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (multiple files) If we have a folder with multiple files, we can reduce duplicated code with iteration. -- .leftcol[ .code80[ ``` . # importing multiple files ├── data/sas/ │ ├── elemapi-2000.sas7bdat │ ├── elemapi2-2000.sas7bdat │ ├── hsb2.sas7bdat │ └── nations.sas7bdat └── worksheets/ └── import.Rmd ``` ] ] -- .rightcol[ .code80[ ```r # create vector of files sas_filenames <- list.files( path = "../data/sas", full.names = TRUE) all_sas_data <- sas_filenames |> # give this vector names purrr::set_names() |> # use read_sas() on all files purrr::map(.x = , .f = read_sas) ``` ] .font90[`all_sas_data` is a list of datasets] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Importing Data (multiple files) .font90[Each named according to their path in `data/sas/`] .code55[ ```r str(all_sas_data) *# $ ../data/sas/elemapi-2000.sas7bdat : tibble [400 × 21] (S3: tbl_df/tbl/data.frame) # ..$ snum : num [1:400] 906 889 887 876 888 ... # .. ..- attr(*, "label")= chr "school number" # ..$ dnum : num [1:400] 41 41 41 41 41 98 98 108 108 108 ... # .. ..- attr(*, "label")= chr "district number" # .. [list output truncated] *# $ ../data/sas/elemapi2-2000.sas7bdat: tibble [400 × 22] (S3: tbl_df/tbl/data.frame) # ..$ snum : num [1:400] 906 889 887 876 888 ... # .. ..- attr(*, "label")= chr "school number" # ..$ dnum : num [1:400] 41 41 41 41 41 98 98 108 108 108 ... # .. ..- attr(*, "label")= chr "district number" # .. [list output truncated] *# $ ../data/sas/hsb2.sas7bdat : tibble [200 × 11] (S3: tbl_df/tbl/data.frame) # ..$ id : num [1:200] 3 5 16 35 8 19 6 1 4 22 ... # ..$ female : num [1:200] 0 0 0 1 1 1 1 1 1 0 ... # .. [list output truncated] *# $ ../data/sas/nations.sas7bdat : tibble [109 × 15] (S3: tbl_df/tbl/data.frame) # ..$ country : chr [1:109] "Algeria" "Argentin" "Australi" "Austria" ... # .. ..- attr(*, "label")= chr "Country" # ..$ pop : num [1:109] 21.9 30.5 15.8 7.6 100.6 ... # .. ..- attr(*, "label")= chr "1985 population in millions" # .. [list output truncated] ``` ]