layout: true <!-- this adds the link footer to all slides, depends on footer-small class in css--> <div class="footer-small"><span>https://github.com/mjfrigaard/ph-lacounty-r/</div> --- name: title-slide class: title-slide, center, middle, inverse # Common Data Objects in R #.fancy[Vectors, Lists, Data Frames and Tibbles] <br> .large[by Martin Frigaard] Written: October 03 2022 Updated: December 02 2022 .footer-large[.right[.fira[ <br><br><br><br><br>[Created using the "λέξις" theme](https://jhelvy.github.io/lexis/index.html#what-does-%CE%BB%CE%AD%CE%BE%CE%B9%CF%82-mean) ]]] --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Materials The slides are in the `slides.pdf` file -- The materials for this training are in the `worksheets` folder: ``` worksheets ├── import.Rmd ├── export.Rmd ├── objects.Rmd ├── rmd-basic.Rmd ├── rmd-tables.Rmd └── rmd-visualizations.Rmd ``` --- background-image: url(www/pdg-hex.png) background-position: 96% 4% background-size: 6% # Outline <br> .leftcol[ #### 1. Importing data #### 2. .red[Common Data Objects] #### 3. R Markdown ] -- .rightcol[ #### 4. R Markdown Data Visualizations #### 5. R Markdown Tables #### 6. Exporting Data ] --- background-image: url(www/pdg-hex.png) class: center, middle, inverse background-position: 96% 4% background-size: 6% # .large[Common Data Objects] -- <br><br> .font90[.green[Open `objects.Rmd` to follow along]] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: vector #### Vectors are the fundamental data object in R <img src="www/atomic-vectors.png" width="85%" height="85%" style="display: block; margin: auto;" /> --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: creating vectors <br> `c()` is used to combine (or concatenate) a variety of elements -- `<-` is referred to as the assignment operator, and it’s used with `c()` to assign elements to a designated object -- *Earlier we used `<-` to create the `medical` dataset* .leftcol[ Create logical and integer vectors (`log_vec` and `int_vec`) ```r log_vec <- c(TRUE, FALSE) int_vec <- c(4L, 7L) ``` ] .rightcol[ Create double and character vectors (`dbl_vec` and `chr_vec`) ```r dbl_vec <- c(2.2, 8.09) chr_vec <- c("A", "D") ``` ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: atomic vectors .cols3[ .font80[Print Atomic Vectors] .code60[ ```r log_vec ``` ``` [1] TRUE FALSE ``` ```r int_vec ``` ``` [1] 4 7 ``` ```r dbl_vec ``` ``` [1] 2.20 8.09 ``` ```r chr_vec ``` ``` [1] "A" "D" ``` ] ] -- .cols3[ .font80[Check with `typeof()`] .code60[ ```r typeof(log_vec) ``` ``` [1] "logical" ``` ```r typeof(int_vec) ``` ``` [1] "integer" ``` ```r typeof(dbl_vec) ``` ``` [1] "double" ``` ```r typeof(chr_vec) ``` ``` [1] "character" ``` ] ] -- .cols3[ .font80[Check `class()`] .code60[ ```r class(log_vec) ``` ``` [1] "logical" ``` ```r class(int_vec) ``` ``` [1] "integer" ``` ```r *class(dbl_vec) ``` ``` [1] "numeric" ``` ```r class(chr_vec) ``` ``` [1] "character" ``` ] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: S3 vectors <img src="www/s3-vectors.png" width="85%" height="85%" style="display: block; margin: auto;" /> --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: S3 vectors .leftcol[ .font80[Create S3 Vectors] .code60[ ```r fct_vec <- factor( x = c("Medium", "Low", "High"), levels = c("Low", "Medium", "High")) date_vec <- c(Sys.Date(), Sys.Date() + 1) dt_vec <- c(Sys.time(), Sys.time() + (86400*365)) difft_vec <- difftime( time1 = Sys.time(), time2 = Sys.time() + (86400*365), units = "days") ``` ] ] .rightcol[ .font80[View S3 vectors] .code55[ ```r fct_vec ``` ``` [1] Medium Low High Levels: Low Medium High ``` ```r date_vec ``` ``` [1] "2022-12-02" "2022-12-03" ``` ```r dt_vec ``` ``` [1] "2022-12-02 09:33:48 PST" "2023-12-02 09:33:48 PST" ``` ```r difft_vec ``` ``` Time difference of -365 days ``` ] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: S3 vectors .leftcol[ .font80[Check `typeof()`] .code60[ ```r typeof(fct_vec) ``` ``` [1] "integer" ``` ```r typeof(date_vec) ``` ``` [1] "double" ``` ```r typeof(dt_vec) ``` ``` [1] "double" ``` ```r typeof(difft_vec) ``` ``` [1] "double" ``` ] ] .rightcol[ .font80[Check `class()`] .code60[ ```r class(fct_vec) ``` ``` [1] "factor" ``` ```r class(date_vec) ``` ``` [1] "Date" ``` ```r class(dt_vec) ``` ``` [1] "POSIXct" "POSIXt" ``` ```r class(difft_vec) ``` ``` [1] "difftime" ``` ] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: S3 vectors .font90[**S3 vectors have additional `attributes()`**] -- .cols3[ .font80[Factor attributes] .code60[ ```r attributes(fct_vec) ``` ``` $levels [1] "Low" "Medium" "High" $class [1] "factor" ``` ] ] -- .cols3[ .font80[Date/Datetime attributes] .code60[ ```r attributes(date_vec) ``` ``` $class [1] "Date" ``` ```r attributes(dt_vec) ``` ``` $class [1] "POSIXct" "POSIXt" $tzone [1] "" ``` ] ] -- .cols3[ .font80[Difftime attributes] .code60[ ```r attributes(difft_vec) ``` ``` $class [1] "difftime" $units [1] "days" ``` ] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: lists .font90[**Vectors have to be the same type, or `class`**] -- .font90[**Lists can contain objects of different `class`es**] -- .leftcol40[ ```r atomic_list <- list( 'logical vector' = log_vec, 'integer vector' = int_vec, 'double vector' = dbl_vec, 'character vector' = chr_vec ) ``` ] -- .rightcol60[ ```r atomic_list ``` ``` $`logical vector` [1] TRUE FALSE $`integer vector` [1] 4 7 $`double vector` [1] 2.20 8.09 $`character vector` [1] "A" "D" ``` ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: lists .font90[**Lists can even contain other lists!**] -- .leftcol[ .font80[Create list of date vectors] .code60[ ```r s3_list <- list( 'date vector' = date_vec, 'datetime vector' = dt_vec, 'difftime vector' = difft_vec ) ``` ] .font80[Create list of lists] .code60[ ```r vector_list <- list( 'S3 list' = s3_list, 'Atomic list' = atomic_list ) ``` ] ] -- .rightcol[ .code50[ ```r vector_list ``` ``` $`S3 list` $`S3 list`$`date vector` [1] "2022-12-02" "2022-12-03" $`S3 list`$`datetime vector` [1] "2022-12-02 09:33:48 PST" "2023-12-02 09:33:48 PST" $`S3 list`$`difftime vector` Time difference of -365 days $`Atomic list` $`Atomic list`$`logical vector` [1] TRUE FALSE $`Atomic list`$`integer vector` [1] 4 7 $`Atomic list`$`double vector` [1] 2.20 8.09 $`Atomic list`$`character vector` [1] "A" "D" ``` ] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: data.frames .font120[**A `data.frame` is a rectangular list**] .leftcol30[ .font80[Create `data.frame`] .code60[ ```r my_df <- data.frame( log_col = log_vec, int_col = int_vec, dbl_col = dbl_vec, chr_col = chr_vec, date_col = date_vec, dt_col = dt_vec ) ``` ] ] -- .rightcol70[ .font80[View `data.frame`] .code60[ ```r my_df ``` ``` log_col int_col dbl_col chr_col date_col dt_col 1 TRUE 4 2.20 A 2022-12-02 2022-12-02 09:33:48 2 FALSE 7 8.09 D 2022-12-03 2023-12-02 09:33:48 ``` ] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: data.frames .font90[**Check the structure of the `data.frame`**] .code60[ ```r str(my_df) ``` ``` 'data.frame': 2 obs. of 6 variables: $ log_col : logi TRUE FALSE $ int_col : int 4 7 $ dbl_col : num 2.2 8.09 $ chr_col : chr "A" "D" $ date_col: Date, format: "2022-12-02" "2022-12-03" $ dt_col : POSIXct, format: "2022-12-02 09:33:48" "2023-12-02 09:33:48" ``` ] -- .font90[**Check the `class` and `typeof()` for the a `data.frame`**] .leftcol[ .code60[ ```r class(my_df) ``` ``` [1] "data.frame" ``` ] ] -- .rightcol[ .code60[ ```r typeof(my_df) ``` ``` [1] "list" ``` ] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: tibbles .font90[**A tibble is a [modern reimagining](https://tibble.tidyverse.org/) of the `data.frame`**] -- .font90[They are created just like `data.frame`s] -- .leftcol30[ .font80[Create `tibble`] .code60[ ```r my_tbl <- tibble( log_col = log_vec, int_col = int_vec, dbl_col = dbl_vec, chr_col = chr_vec, date_col = date_vec, dt_col = dt_vec ) ``` ] ] -- .rightcol70[ .font80[View `tibble`] .code60[ ```r my_tbl ``` ``` # A tibble: 2 × 6 log_col int_col dbl_col chr_col date_col dt_col <lgl> <int> <dbl> <chr> <date> <dttm> 1 TRUE 4 2.2 A 2022-12-02 2022-12-02 09:33:48 2 FALSE 7 8.09 D 2022-12-03 2023-12-02 09:33:48 ``` ] ] --- background-image: url(www/pdg-hex.png) class: left, top background-position: 96% 4% background-size: 6% # Data Objects: data.frames & tibbles .font90[`tibble`s print a little nicer than `data.frame`s, and we'll primarily be using them because they work well with other functions for tables and visualizations.] -- .code80[ ```r my_df ``` ``` log_col int_col dbl_col chr_col date_col dt_col 1 TRUE 4 2.20 A 2022-12-02 2022-12-02 09:33:48 2 FALSE 7 8.09 D 2022-12-03 2023-12-02 09:33:48 ``` ] -- .code80[ ```r my_tbl ``` ``` # A tibble: 2 × 6 log_col int_col dbl_col chr_col date_col dt_col <lgl> <int> <dbl> <chr> <date> <dttm> 1 TRUE 4 2.2 A 2022-12-02 2022-12-02 09:33:48 2 FALSE 7 8.09 D 2022-12-03 2023-12-02 09:33:48 ``` ]