In the previous .Rmd, we downloaded the data table from the Texas Department of Criminal Justice website, which keeps records of every inmate they execute.
These data are imported from the .Rmd we used to scrape the website. These data are in the folder below.
fs::dir_tree("../data/wk10-dont-mess-with-texas/")## ../data/wk10-dont-mess-with-texas/
## ├── 2021-11-21-ExecutedOffenders.csv
## ├── 2021-11-30-ExecutedOffenders.csv
## └── processed
##     ├── 2021-11-21
##     │   ├── 2021-11-21-ExExOffndrshtml.csv
##     │   ├── 2021-11-21-ExExOffndrsjpg.csv
##     │   └── ExOffndrsComplete.csv
##     └── 2021-11-30
##         └── ExOffndrsComplete.csvThis will import the most recent data.
# fs::dir_ls("data/processed/2021-10-25")
ExecOffenders <- readr::read_csv("https://bit.ly/2Z7pKTI")
ExOffndrsComplete <- readr::read_csv("https://bit.ly/3oLZdEm")In this post, we will use purrrs iteration tools to download the images attached to the website profiles.
purrr’s iteration tools to download the .jpg filesFollow these three purrr steps from the workshop by Charlotte Wickham. We’ll go over them below:
We can test the new url columns in the ExecOffenders with the magick::image_read() function.
library(magick)
test_image <- ExecOffenders %>% 
  # only jpg row
  dplyr::filter(jpg_html == "jpg") %>% 
  # pull the info url column
  dplyr::select(info_url) %>% 
  # sample 1
  dplyr::sample_n(size = 1) %>% 
  # convert to character 
  base::as.character() 
test_image## [1] "http://www.tdcj.state.tx.us/death_row/dr_info/chappellwilliam.jpg"You should see an image in the RStudio viewer pane (like below)
# pass test_image to image_read()
magick::image_read(test_image)dplyr::filter the ExecOffenders into ExOffndrsCompleteJpgs. Put these urls into a vector (jpg_url), then create a folder to download them into (jpg_path).
ExOffndrsCompleteJpgs <- ExecOffenders %>% 
  dplyr::filter(jpg_html == "jpg") 
jpg_url <- ExOffndrsCompleteJpgs$info_url
if (!base::file.exists("jpgs/")) {
  base::dir.create("jpgs/")
}
jpg_path <- paste0("jpgs/", 
                   # create basename
              base::basename(jpg_url))
jpg_path %>% utils::head()## [1] "jpgs/_coble.jpg"         "jpgs/jenningsrobert.jpg"
## [3] "jpgs/_ramos.jpg"         "jpgs/bigbyjames.jpg"    
## [5] "jpgs/ruizroland.jpg"     "jpgs/garciagustavo.jpg"purrr::walk2() to download all filesNow use the purrr::walk2() function to download the files. How does walk2 work?
First look at the arguments for utils::download.file().
?utils::download.filewalk2()The help files tell us the walk2 function is “specialized for the two argument case”. So .x and .y become the two arguments we need to iterate over download.file(). We will walk through this step-by-step below:
.x = the file path, which we created with the selector gadget above (in jpg_url)
.y = the location we want the files to end up (jpg_path), and
the function we want to iterate over .x and .y (download.file).
When we pass everything to purrr::walk2, R will go to the URL, download the file located at the URL, and put it in the associated jpgs/ folder.
Execute the code below and you will see the .jpgs downloading into the jpg folder.
purrr::walk2(.x = jpg_url, 
             .y = jpg_path, 
             .f = download.file)You should see the following in your console.