In the previous .Rmd, we downloaded the data table from the Texas Department of Criminal Justice website, which keeps records of every inmate they execute.
These data are imported from the .Rmd we used to scrape the website. These data are in the folder below.
fs::dir_tree("../data/wk10-dont-mess-with-texas/")
## ../data/wk10-dont-mess-with-texas/
## ├── 2021-11-21-ExecutedOffenders.csv
## ├── 2021-11-30-ExecutedOffenders.csv
## └── processed
## ├── 2021-11-21
## │ ├── 2021-11-21-ExExOffndrshtml.csv
## │ ├── 2021-11-21-ExExOffndrsjpg.csv
## │ └── ExOffndrsComplete.csv
## └── 2021-11-30
## └── ExOffndrsComplete.csv
This will import the most recent data.
# fs::dir_ls("data/processed/2021-10-25")
ExecOffenders <- readr::read_csv("https://bit.ly/2Z7pKTI")
ExOffndrsComplete <- readr::read_csv("https://bit.ly/3oLZdEm")
In this post, we will use purrr
s iteration tools to download the images attached to the website profiles.
purrr
’s iteration tools to download the .jpg filesFollow these three purrr
steps from the workshop by Charlotte Wickham. We’ll go over them below:
We can test the new url columns in the ExecOffenders
with the magick::image_read()
function.
library(magick)
test_image <- ExecOffenders %>%
# only jpg row
dplyr::filter(jpg_html == "jpg") %>%
# pull the info url column
dplyr::select(info_url) %>%
# sample 1
dplyr::sample_n(size = 1) %>%
# convert to character
base::as.character()
test_image
## [1] "http://www.tdcj.state.tx.us/death_row/dr_info/chappellwilliam.jpg"
You should see an image in the RStudio viewer pane (like below)
# pass test_image to image_read()
magick::image_read(test_image)
dplyr::filter
the ExecOffenders
into ExOffndrsCompleteJpgs
. Put these urls into a vector (jpg_url
), then create a folder to download them into (jpg_path
).
ExOffndrsCompleteJpgs <- ExecOffenders %>%
dplyr::filter(jpg_html == "jpg")
jpg_url <- ExOffndrsCompleteJpgs$info_url
if (!base::file.exists("jpgs/")) {
base::dir.create("jpgs/")
}
jpg_path <- paste0("jpgs/",
# create basename
base::basename(jpg_url))
jpg_path %>% utils::head()
## [1] "jpgs/_coble.jpg" "jpgs/jenningsrobert.jpg"
## [3] "jpgs/_ramos.jpg" "jpgs/bigbyjames.jpg"
## [5] "jpgs/ruizroland.jpg" "jpgs/garciagustavo.jpg"
purrr::walk2()
to download all filesNow use the purrr::walk2()
function to download the files. How does walk2
work?
First look at the arguments for utils::download.file()
.
?utils::download.file
walk2()
The help files tell us the walk2
function is “specialized for the two argument case”. So .x
and .y
become the two arguments we need to iterate over download.file()
. We will walk through this step-by-step below:
.x
= the file path, which we created with the selector gadget above (in jpg_url
)
.y
= the location we want the files to end up (jpg_path
), and
the function we want to iterate over .x
and .y
(download.file
).
When we pass everything to purrr::walk2
, R will go to the URL, download the file located at the URL, and put it in the associated jpgs/
folder.
Execute the code below and you will see the .jpgs downloading into the jpg
folder.
purrr::walk2(.x = jpg_url,
.y = jpg_path,
.f = download.file)
You should see the following in your console.