layout: true <!-- this adds the link footer to all slides, depends on footer-small class in css--> <div class="footer-small"><span>https://github.com/mjfrigaard/csuc-data-journalism</div> --- name: title-slide class: title-slide, center, middle, inverse # Introduction To R Programming #.fancy[R objects and functions] <br> .large[by Martin Frigaard] Written: September 30 2021 Updated: November 30 2021 .footer-large[.right[.fira[ <br><br><br><br><br>[Created using the "λέξις" theme](https://jhelvy.github.io/lexis/index.html#what-does-%CE%BB%CE%AD%CE%BE%CE%B9%CF%82-mean) ]]] --- class: center, middle # R Programming ### R is a versatile language for data wrangling, visualization, and modeling --- class: left, top # Resources ## Link to slides https://mjfrigaard.github.io/csuc-data-journalism/slides.html ## Link to exercises https://mjfrigaard.github.io/csuc-data-journalism/lessons-exercises.html --- background-image: url("https://www.r-project.org/logo/Rlogo.png") background-size: contain class: inverse, center, middle # Getting Started .rightcol[Image credit: [R Project](https://www.r-project.org)] --- class: left, top # Installing R Install R from the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/ <img src="img/cran.png" width="70%" height="70%" style="display: block; margin: auto;" /> -- You are recommended to use the [RStudio IDE](https://www.rstudio.com/products/rstudio/) (*but you do not have to*). --- class: left, top # Download RStudio https://rstudio.com/products/rstudio/download/ <img src="img/rstudio.png" width="70%" height="70%" /> --- class: left, top # Or use RStudio.Cloud https://rstudio.cloud/ <img src="img/rstudio-cloud.png" width="70%" height="70%" /> --- background-image: url("img/r-console.png") background-size: contain class: center, bottom # .red[The R Console] --- background-image: url("img/rstudio-launch.png") background-size: contain class: left, middle # .red[The RStudio IDE] --- class: left, top # Running R Commands You can run R commands in the Console by entering them after the `>` operator (see example in R below) .leftcol[ ```r print("Hello World") ``` ``` [1] "Hello World" ``` ] -- .rightcol[ <img src="img/commands-console.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] --- class: left # Running R Commands You can also run them in R scripts (see example in RStudio below) -- <img src="img/script-rstudio.png" width="110%" height="110%" style="display: block; margin: auto;" /> --- class: left, middle # R Syntax -- ### The R syntax is comprised of two major elements: -- .leftcol[ ## Functions #### *Functions perform operations: calculate a mean, build a table, create a graph, etc.* ] -- .rightcol[ ## Objects #### *Objects hold information: a collection of numbers, dates, words, models results, etc.* ] --- class: inverse, center, middle # We use .yellow[functions] to perform operations on .yellow[objects] --- class: left, top # Example: create a vector of numbers The standard assignment operator in R is `<-`. We can use this in combination with `c()` to create an object `x`, which contains five numbers (`1`, `3`, `5`, `7`, `9`). ```r *x <- c(1, 3, 5, 7, 9) ``` -- Place `x` inside `print()` to print `x` to the console ```r x <- c(1, 3, 5, 7, 9) *print(x) ``` NOTE: We can also use the `=` and move `->` to the end of the expression, but this is not recommended --- class: left # R Syntax: functions ```r x <- c(1, 3, 5, 7, 9) print(x) ``` ``` [1] 1 3 5 7 9 ``` In the example above, we've created object `x`, but what are `<-` and `c()`? -- We can check this by passing them both in backticks to the `class()` function below. -- ```r class(`<-`) ``` ``` [1] "function" ``` -- ```r class(`c`) ``` ``` [1] "function" ``` --- class: left # Functions in R > ***Functions*** perform operations (calculate, model, graph, etc.) on various ***objects*** that contain information (blood pressures, sales, political party affiliation, etc.) .leftcol[ Objects are similar to nouns: they hold information ```r object_1 <- "Sally" object_2 <- "dog" object_3 <- "road" ``` ] .rightcol[ Functions are similar to verbs: they do things to nouns ```r work() run() implement() ``` ] --- class: left # Functions and objects Functions perform operations on objects. .cols3[ ```r sally_object <- "Sally" *work(sally_object) ``` > *Sally works.* ] .cols3[ ```r dog_object <- "dog" *run(dog_object) ``` > *The dog runs.* ] .cols3[ ```r idea_object <- "idea" *implement(idea_object) ``` > *Implement the idea.* ] --- class: left, top ## Packages and functions in R .leftcol[ Functions are stored in R packages. Fortunately, R comes 'out-of-the-box' with a set of functions for basic data management and statistical calculations. To access the functions in a package, use the following syntax: ```r package::function(object) ``` ] .rightcol[ The `median()` function comes from the `stats` package. ```r stats::median(x) ``` ``` [1] 5 ``` The `typeof()` function comes from the `base` package. ```r base::typeof(x) ``` ``` [1] "double" ``` ] --- class: left # Packages and functions Use tab-completion and the arrow keys in RStudio to explore a packages functions. <img src="img/tab-completion.png" width="80%" height="80%" style="display: block; margin: auto;" /> -- We can take advantage of tab-completion by using names that allow us to look up common objects. For example, naming plot objects with a `plot_` prefix will allow us to use tab-completion to scroll through each object without having to remember the specific name. --- class: left # Installing packages from CRAN To install packages from CRAN, we can use the `install.packages()` function. ```r install.packages("package name") ``` NOTE: *if this is the first time installing packages, you'll probably be presented with a list of CRAN “mirrors” to use--choose the mirror closest to you.* -- To load the package into your environment, use `library(package name)` ```r library(package name) ``` --- class: left, top # Installing packages from CRAN in RStudio You can also use the **Packages** pane in RStudio <img src="img/install-package.png" width="40%" height="40%" style="display: block; margin: auto;" /> --- class: left # Installing user packages The code for user-written packages are typically stored in code repository, like [Github.](https://github.com/) .leftcol40[ .small[ To access user-written packages, you'll need to install the `devtools` or `remotes` packages. ] ```r install.packages("devtools") install.packages("remotes") ``` ] .rightcol60[ .small[ Use `devtools::install_github()` or `remotes::install_github()` (with the author's username and package repository name) ] ```r devtools::install_github(<username>/<package>) remotes::install_github(<username>/<package>) ``` ] --- class: inverse, left, middle # Objects -- ### *R is typically referred to as an "object-oriented programming" language* -- ### *We've covered functions, so now we'll dive into the aspects of some common R objects* --- class: left, top # Types of objects in R .leftcol[ - **Vectors** - atomic (logical, integer, double, and character) - S3 (factors, dates, date-times, durations) - **Matrices** - two dimensional objects ] .rightcol[ - **Arrays** - multidimensional objects - **Data frames & tibbles** - rectangular objects - **Lists** - recursive objects ] --- class: left, top # Atomic vectors Vectors are the fundamental data type in R. -- Many of R's functions are *vectorised*, which means they're designed for performing operations on vectors. -- The "atomic" in atomic vectors means, "*of or forming a single irreducible unit or component in a larger system.*" -- Atomic vectors can be logical, integer, double, or character (strings). -- We will build each of these vectors using the previously covered assignment operator (`<-`) and `c()` function (*which stands for 'combine'*). --- class: left, top # Store and explore -- .leftcol[ A common practice in R is to create an object, perform an operation on that object with a function, and store the results in new object. We then explore the contents of the new object with another function. ] -- .rightcol[.border[ <img src="img/store-explore.png"/> ] ] -- *** Many of the functions in R are written with this *store and explore* process in mind. --- class: left, top # Atomic vectors: numeric The two atomic numeric vectors are integer and double. -- Integer vectors are created with a number and capital letter `L` (i.e. `1L`, `10L`) ```r vec_integer <- c(1L, 10L, 100L) ``` -- Double vectors can be entered as decimals, but they can also be created in scientific notation (`2.46e8`), or values determined by the floating point standard (`Inf`, `-Inf` and `NaN`). ```r vec_double <- c(0.1, 1.0, 10.01) ``` --- class: left, top # Atomic vectors: numeric We will use the `typeof()` and `is.numeric()` functions to explore the contents of `vec_integer` and `vec_double`. ```r typeof(vec_integer) ``` ``` [1] "integer" ``` -- ```r is.numeric(vec_integer) ``` ``` [1] TRUE ``` -- `typeof()` tells us that this is an `"integer"` vector, and `is.numeric()` tests to see if it is numeric (which is `TRUE`). --- class: left, top # Atomic vectors: logical vectors Logical vectors can be `TRUE` or `FALSE` (or `T` or `F` for short). Below we use `typeof()` and `is.logical()` to explore the contents of `vec_logical`. -- ```r vec_logical <- c(TRUE, FALSE) typeof(vec_logical) ``` ``` [1] "logical" ``` -- ```r is.logical(vec_logical) ``` ``` [1] TRUE ``` --- class: left, top # Atomic vectors: logical vectors Logical vectors are handy because when we add them together, and the total number tells us how many `TRUE` values there are. ```r TRUE + TRUE + FALSE + TRUE ``` ``` [1] 3 ``` -- Logical vectors can be useful for subsetting (a way of extracting certain elements from a particular object) based on a set of conditions. -- *How many elements in `vec_integer` are greater than `5`?* ```r vec_integer > 5 ``` ``` [1] FALSE TRUE TRUE ``` --- class: left, top # Atomic vectors: character vectors Character vectors store text data (note the double quotes). We'll *store and explore* again. -- ```r vec_character <- c("A", "B", "C") typeof(vec_character) ``` ``` [1] "character" ``` -- ```r is.character(vec_character) ``` ``` [1] TRUE ``` -- Character vectors typically store text information that we need to include in a calculation, visualization, or model. In these cases, we'll need to convert them into `factor`s. We'll cover those next. --- class: inverse, center, middle # S3 vectors ### *S3 vectors can be factors, dates, date-times, and difftimes.* --- class: left, top # S3 vectors: factors Factors are categorical vectors with a given set of responses. Below we create a factor with three levels: `low`, `medium`, and `high` -- ```r vec_factor <- factor(x = c("low", "medium", "high")) class(vec_factor) ``` ``` [1] "factor" ``` -- Factors are not character variables, though. They get stored with an integer indicator for each character level. ```r typeof(vec_factor) ``` ``` [1] "integer" ``` --- class: left, top # S3 vectors: factor attributes Factors are integer vectors with two additional attributes: `class` is set to `factor`, and `levels` for each unique response. -- We can check this with `unique()` and `attributes()` functions. ```r unique(vec_factor) ``` ``` [1] low medium high Levels: high low medium ``` -- ```r attributes(vec_factor) ``` ``` $levels [1] "high" "low" "medium" $class [1] "factor" ``` --- class: left, top # S3 vectors: factor attributes .leftcol[ .small[Levels are assigned alphabetically, but we can manually assign the order of factor levels with the `levels` argument in `factor()`.] ```r vec_factor <- factor( x = c("medium", "high", "low"), levels = c("low", "medium", "high")) ``` ] .rightcol[ .small[We can check the levels with `levels()` or `unclass()`] ```r levels(vec_factor) ``` ``` [1] "low" "medium" "high" ``` ```r unclass(vec_factor) ``` ``` [1] 2 3 1 attr(,"levels") [1] "low" "medium" "high" ``` ] --- class: left, top # S3 vectors: date Dates are stored as `double` vectors with a `class` attribute set to `Date`. .leftcol60[ .small[ R has a function for getting today's date, `Sys.Date()`. We'll create a `vec_date` using `Sys.Date()` and adding `1` and `2` to this value. ] ```r vec_date <- c(Sys.Date(), Sys.Date() + 1, Sys.Date() + 2) vec_date ``` ``` [1] "2021-11-30" "2021-12-01" "2021-12-02" ``` ] .rightcol40[ .small[ We can see adding units to the `Sys.Date()` added days to today's date. The `attributes()` function tells us this vector has it's own class. ] ```r attributes(vec_date) ``` ``` $class [1] "Date" ``` ] --- class: left, top # S3 vectors: date calculations Dates are stored as a number because they represent the amount of days since January 1, 1970, which is referred to as the [UNIX Epoch](https://en.wikipedia.org/wiki/Unix_time). -- `unclass()` tells us what the actual number is. ```r unclass(vec_date) ``` ``` [1] 18961 18962 18963 ``` --- class: left, top # S3 vectors: date-time Date-times contain a bit more information than dates. The function to create a datetime vector is `as.POSIXct()`. We'll convert `vec_date` to a date-time and store it in `vec_datetime_ct`. View the results below. -- ```r vec_date ``` ``` [1] "2021-11-30" "2021-12-01" "2021-12-02" ``` -- ```r vec_datetime_ct <- as.POSIXct(x = vec_date) vec_datetime_ct ``` ``` [1] "2021-11-29 17:00:00 MST" "2021-11-30 17:00:00 MST" [3] "2021-12-01 17:00:00 MST" ``` We can see `vec_datetime_ct` stores some additional information. --- class: left, top # S3 vectors: date-time attributes `vec_datetime_ct` is a `double` vector with an additional attribute of `class` set to `"POSIXct" "POSIXt"`. ```r typeof(vec_datetime_ct) ``` ``` [1] "double" ``` -- ```r attributes(vec_datetime_ct) ``` ``` $class [1] "POSIXct" "POSIXt" ``` --- class: left, top # S3 vectors: date-time help .leftcol[ Read more about date-times by entering the `as.POSIXct` function into the console preceded by a question mark. ```r ?as.POSIXct ``` ] -- .rightcol[ <img src="img/help-date-time.png" width="70%" height="70%" style="display: block; margin: auto;" /> ] --- class: left, top # S3 vectors: difftime .leftcol[ Difftimes are durations, so we to create them with `time_01` and `time_02`: ```r time_01 <- Sys.Date() time_02 <- Sys.Date() + 10 time_01 ``` ``` [1] "2021-11-30" ``` ```r time_02 ``` ``` [1] "2021-12-10" ``` ] .rightcol[ Difftimes are stored as a `double` vector. ```r vec_difftime <- difftime(time_01, time_02, units = "days") vec_difftime ``` ``` Time difference of -10 days ``` ```r typeof(vec_difftime) ``` ``` [1] "double" ``` ] --- class: left, top # S3 vectors: difftime attributes .leftcol[ Difftimes are their own `class` and have a `units` attribute set to whatever we've specified in the `units` argument. ```r attributes(vec_difftime) ``` ``` $class [1] "difftime" $units [1] "days" ``` ] .rightcol[ We can see the actual number stored in the vector with `unclass()` ```r unclass(vec_difftime) ``` ``` [1] -10 attr(,"units") [1] "days" ``` ] --- class: left, top # Matrices .leftcol[ A matrix is several vectors stored together into two a two-dimensional object. ```r mat_data <- matrix( data = c(vec_double, vec_integer), nrow = 3, ncol = 2, byrow = FALSE) mat_data ``` ``` [,1] [,2] [1,] 0.10 1 [2,] 1.00 10 [3,] 10.01 100 ``` ] .rightcol[ We can check the dimensions of `mat_data` with `dim()`. ```r dim(mat_data) ``` ``` [1] 3 2 ``` This is a three-column, two-row matrix. ] --- class: left, top # Matrix positions The output in the console tells us where each element is located in `mat_data`. .leftcol[ For example, if I want to get the `10` that's stored in `vec_integer`, I can use look at the output and use the indexes. ```r mat_data ``` ``` [,1] [,2] [1,] 0.10 1 [2,] 1.00 10 [3,] 10.01 100 ``` ] .rightcol[ By placing the index (`[2, 2]`) next to the object, I am telling R, "*only return the value in this position*". ```r mat_data[2, 2] ``` ``` [1] 10 ``` ] --- class: left, top # Arrays .leftcol[ Arrays are like matrices, but they can have more dimensions. ```r dat_array <- array( data = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18), dim = c(3, 3, 2)) ``` ] .rightcol[ ```r dat_array ``` ``` , , 1 [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 , , 2 [,1] [,2] [,3] [1,] 10 13 16 [2,] 11 14 17 [3,] 12 15 18 ``` ] --- class: left, top # Array layers .leftcol[.border[ `dat_array` contains numbers 1 through 18 in three columns and three rows, stacked in two *layers*. <!-- <img src="img/array.png"/> --> <img src="img/array.png" width="55%" height="55%" style="display: block; margin: auto;" /> ]] .rightcol[ ```r class(dat_array) ``` ``` [1] "array" ``` ```r class(mat_data) ``` ``` [1] "matrix" "array" ``` > Matrices are arrays, but arrays > are not matrices ] --- class: left, top # Data Frames .leftcol[ Data frames are rectangular data with rows and columns (or observations and variables). ```r DataFrame <- data.frame( character = c("A", "B", "C"), integer = c(0.1, 1.0, 10.01), logical = c(TRUE, FALSE, TRUE)) ``` ] .rightcol[ ```r DataFrame ``` ``` character integer logical 1 A 0.10 TRUE 2 B 1.00 FALSE 3 C 10.01 TRUE ``` .small[ > NOTE: `stringsAsFactors = FALSE` is not > required as of R version 4.0.0. ] ] --- class: left # Data Frames .leftcol[ Check the structure of the `data.frame` with `str()` ```r str(DataFrame) ``` ``` 'data.frame': 3 obs. of 3 variables: $ character: chr "A" "B" "C" $ integer : num 0.1 1 10 $ logical : logi TRUE FALSE TRUE ``` ] .rightcol[ `str()` gives us a transposed view of the `DataFrame` object, and tells us the dimensions of the object. ] --- class: left # Tibbles .leftcol[ Tibbles are a special kind of `data.frame` (*they print better to the console and character vectors are never coerced into factors*). ```r Tibble <- tibble::tribble( * ~character, ~integer, ~logical, "A", 0.1, TRUE, "B", 1, FALSE, "C", 10.01, TRUE) ``` ] .rightcol[ The syntax to build them is slightly different, too. ```r Tibble ``` ``` # A tibble: 3 × 3 character integer logical <chr> <dbl> <lgl> 1 A 0.1 TRUE 2 B 1 FALSE 3 C 10.0 TRUE ``` ] --- class: left # Tibbles .leftcol[ Check the structure of `Tibble`. ```r str(Tibble) ``` ``` tibble [3 × 3] (S3: tbl_df/tbl/data.frame) $ character: chr [1:3] "A" "B" "C" $ integer : num [1:3] 0.1 1 10 $ logical : logi [1:3] TRUE FALSE TRUE ``` ] .rightcol[ <br><br> `str()` tells us `tibbles` are `S3` objects, with types `tbl_df`, `tbl`, and `data.frame`. ] --- class: left, top # Data frames and tibbles If you're importing spreadsheets, most of the work you'll do in R will be with rectangular data objects (i.e. `data.frame`s and `tibble`s). -- .leftcol[.border[ <img src="img/data-frame-tibble.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] ] .rightcol[ <br><br> *These are the common rectangular data storage object for tabular data in R* ] --- class: left # Data frames & tibbles .leftcol[.small[ ```r DataFrame ``` ``` character integer logical 1 A 0.10 TRUE 2 B 1.00 FALSE 3 C 10.01 TRUE ``` > the `data.frame` prints the column names and contents ]] .rightcol[.small[ ```r Tibble ``` ``` # A tibble: 3 × 3 character integer logical <chr> <dbl> <lgl> 1 A 0.1 TRUE 2 B 1 FALSE 3 C 10.0 TRUE ``` > the `tibble` prints the column names, dimensions, formats, and contents ]] --- class: left # Data frames & tibbles If we check the `type` of the `DataFrame` and `Tibble`... .leftcol[ ```r typeof(DataFrame) ``` ``` [1] "list" ``` ] .rightcol[ ```r typeof(Tibble) ``` ``` [1] "list" ``` ] -- <br><br><br><br> > ...we see they are `lists` --- class: left # Data Frames & Tibbles Both `data.frame`s and `tibble`s are their own class, .leftcol40[ ```r class(DataFrame) ``` ``` [1] "data.frame" ``` ] .rightcol60[ ```r class(Tibble) ``` ``` [1] "tbl_df" "tbl" "data.frame" ``` ] <br><br><br><br><br> > So we can think of `data.frame`s and `tibble`s as special kinds of *rectangular* lists, made with different types of vectors, with each vector being of equal length. --- class: left # Lists .leftcol55[ Lists are special objects because they can contain all other objects (including other lists). .small[ ```r dat_list <- list( "integer" = vec_integer, "array" = dat_array, "matrix data" = mat_data, "data frame" = DataFrame, "tibble" = Tibble) ``` ]] .rightcol45[ Lists have a `names` attribute, which we've defined above in double quotes. .small[ ```r attributes(dat_list) ``` ``` $names [1] "integer" "array" "matrix data" "data frame" "tibble" ``` ]] --- class: left # List structure If we check the structure of the `dat_list`, we see the structure of list, and the structure of the elements in the list. ```r str(dat_list) ``` ``` List of 5 $ integer : int [1:3] 1 10 100 $ array : num [1:3, 1:3, 1:2] 1 2 3 4 5 6 7 8 9 10 ... $ matrix data: num [1:3, 1:2] 0.1 1 10 1 10 ... $ data frame :'data.frame': 3 obs. of 3 variables: ..$ character: chr [1:3] "A" "B" "C" ..$ integer : num [1:3] 0.1 1 10 ..$ logical : logi [1:3] TRUE FALSE TRUE $ tibble : tibble [3 × 3] (S3: tbl_df/tbl/data.frame) ..$ character: chr [1:3] "A" "B" "C" ..$ integer : num [1:3] 0.1 1 10 ..$ logical : logi [1:3] TRUE FALSE TRUE ``` --- class: left, top # Recap **In R, two major elements: functions and objects.** - *functions are verbs, objects are nouns* -- **Packages: use `install.packages()` and `library()` to load functions from packages** - *or `devtools::install_github(<username>/<package>)` or `remotes::install_github(<username>/<package>)`* -- **The most common R object is a vector** - Atomic vectors: *logical, integer, double, or character (strings)* - S3 vectors: *factors, dates, date-times, and difftimes* --- class: left, top # Recap, cont. **More complicated data structures: matrices and arrays** - Matrix: *two-dimensional object* - Array: *multidimensional object* -- **Rectangular data structures:** - *`data.frame`s & `tibble`s are special kinds of rectangular lists, which can hold different types of vectors, with each vector being of equal length* -- **Catch-all data structures:** - *lists can contain all other objects (including other lists)* --- class: left, top # More resources Learn more about R objects in the help files or the following online texts: 1. [R for Data Science](https://r4ds.had.co.nz/) 2. [Advanced R](https://adv-r.hadley.nz/) 3. [Hands on Programming with R](https://rstudio-education.github.io/hopr/r-objects.html) 4. [R Language Definition](https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Objects) --- class: center, top # THANK YOU! ## Feedback @mjfrigaard on Twitter and Github mjfrigaard@gmail.com