Debugging in RStudio

Developing code with browser()

code
debugging
Author

Martin Frigaard

Published

June 1, 2023

In this post I’ll cover using the browser() function with RStudio’s debugger. RStudio’s debugging tools are built into the IDE, which provides a seamless transition between writing, running, and debugging code.

Debugging

Debuggers are a critical tool when you’re programming, and they have several benefits that make them a must-use for any R user. You’ll inevitably encounter an error or unexpected behavior while you’re programming. Using a debugger allows you to ‘step through’ your code line-by-line, which makes it easier to find the precise location of bugs and errors and the conditions under which they occur.

But debuggers aren’t only helpful in dealing with errors. The debugger can also be a great learning tool because it provides an interactive way to see how the code is being executed and the order in which functions are being called. For example, you might know that a function returns a particular object but can’t determine how that object was created. Debugging lets us get ‘under the hood’ of our code and see how it’s really working.

You’re probably doing some version of debugging already. If you’ve ever dropped a call to print() or return() at some well-placed intermediate point in a function to try and understand its behavior, then you know the challenge debugging tries to solve: We can’t see what happens inside the parentheses when code is executed. When you use print() or return() in this way, it’s an attempt to indirectly investigate how/if/where the code is performing its intended purpose.

In this post, I’ll cover using the browser() function and RStudio’s debugger while developing a series of small, modular functions for returning a table of ‘package data structures.’ The code for this post comes from dbap (‘debugging app-package’).

Getting started

I want to create a function that returns a table of ‘data structure’ columns that describe the available data.frame or tibble objects loaded with a package. Below is a small example of the desired return object from this function:

Package Dataset Class Columns Rows Logical Numeric Character Factor List
dplyr starwars tbl_df, tbl, data.frame 13 19066 0 11 1 1 0
datasets mtcars data.frame 11 32 0 11 0 0 0

This table shows the storms data from dplyr and the mtcars data from datasets. The columns include the Package the data came from, the dataset name (Dataset), the data Title from the documentation, the Class of the data object, the total number of Columns and Rows, and the number of columns by type (Logical, Numeric, Character, Factor and List).

One of the first steps for creating this function is to verify a package’s namespace is loaded. I’ve written the check_pkg_ns() to check this.

check_pkg_ns()
check_pkg_ns <- function(pkg, quiet = FALSE) {
  if (isFALSE(quiet)) {
    # with messages
    if (!isNamespaceLoaded(pkg)) {
      if (requireNamespace(pkg, quietly = FALSE)) {
        cat(paste0("Loading package: ", pkg, "\n"))
      } else {
        stop(paste0(pkg, " not available"))
      }
    } else {
      cat(paste0("Package ", pkg, " loaded\n"))
    }
  } else {
    # without messages
    if (!isNamespaceLoaded(pkg)) {
      if (requireNamespace(pkg, quietly = TRUE)) {
      } else {
        stop(paste0(pkg, " not available"))
      }
    }
  }
}

check_pkg_ns() checks if a packages’s namespace is loaded, and if not, loads it. This function assumes the package (pkg) has been installed with install.packages() (I’ve also written check_pkg_inst() to check if the package has been installed.)

Experiment

Before debugging, I’ll read the documentation and help files to find examples or use cases for ‘mini-experiments.’ These are designed to clarify any function arguments and learn how the code truly works. Experiments should produce predictable, definitive (preferably incompatible) outputs from each function.

Namespace functions

The help file contains the following helpful statement on isNamespaceLoaded():

isNamespaceLoaded(pkg) is equivalent to but more efficient than pkg %in% loadedNamespaces()

First, I’ll check the loaded namespaces with loadedNamespaces(), then look for a package I know isn’t in the namespace with isNamespaceLoaded(). I’ll use the fs package because it isn’t loaded or attached to the search() list:

# what's in the namespace? 
loadedNamespaces()
 [1] "compiler"   "rsconnect"  "graphics"  
 [4] "tools"      "rstudioapi" "utils"     
 [7] "grDevices"  "stats"      "datasets"  
[10] "methods"    "base"

Check if fs is in the loaded namespace:

# verify fs is not loaded
isNamespaceLoaded("fs")
[1] FALSE

The help file tells me the following about requireNamespace:

requireNamespace is a wrapper for loadNamespace analogous to require() that returns a logical value.”

…and…

requireNamespace returns TRUE if it succeeds or FALSE

I’ll load a package ("fs") with requireNamespace() and verify it’s in the namespace with isNamespaceLoaded().

# add "fs" to the namespace
requireNamespace("fs")
Loading required namespace: fs
[1] TRUE
# verify it's been added 
isNamespaceLoaded("fs")
[1] TRUE

Finally, I’ll unload the "fs" package from the namespace so it can be tested in the debugger.

# remove fs
unloadNamespace("fs")
# verify fs has been unloaded
isNamespaceLoaded("fs")
[1] FALSE

The great thing about designing these mini experiments is that they can be quickly converted into testthat tests. I’m now confident I can use the namespace functions to:

  1. View loaded packages namespaces
  2. Check for a specific package in the loaded namespaces
  3. Require a package namespace is loaded
  4. Remove a loaded package namespace

These are the behaviors I want to confirm in check_pkg_ns() using the browser() function.

browser()

If I want to explore the behaviors of the namespace functions in check_pkg_ns(), I need to add browser() somewhere I can ‘step into’ this function and then proceed through line-by-line. In this case, the top of the function makes sense:

(a) browser() in check_pkg_ns()
Figure 1: browser() placement in check_pkg_ns()

Debug mode

To enter debugging mode, I’ll need to run check_pkg_ns() or source R/check_pkg_ns.R with the package I used in my experiments.

check_pkg_ns("fs")
(a) Debug mode
Figure 2: RStudio IDE in debug mode

The browser() function is one of the multiple methods for using RStudio debugging tools (see the TIP callout box below for more).

TIP: Other debugging methods

In this post, I focused on using the browser() function to enter debug mode, but RStudio has several built-in tools that can help you debug your R code:

  • Debug function on error: You can set R to automatically enter the debugger when an error occurs by using options(error = utils::recover). Then, when an error occurs, you’ll be given a menu of places to browse, the most recent (the location where the error occurred) first.

  • Breakpoints: Breakpoints can be set in your R scripts to pause execution at a particular line of code. You can add breakpoints by clicking to the left of the line number in the script editor or by pressing Shift+F9 with your cursor on the desired line. Then, run your code. Execution will stop just before the line with the breakpoint, allowing you to inspect the current state of the environment.

  • debug(): You can use debug(function_name) to flag a function for “debug” mode. When you call the function, the debugger will open and stop at the first line of the function, where you can step through the function line by line, inspect the environment, and see what’s happening at each step.

  • traceback(): When an error occurs, you can call traceback() to get a stack trace that shows you the sequence of calls that led up to the error.

  • Code Diagnostics: RStudio provides real-time notifications about potential issues in your code, like syntax errors or unused variables. These are not technically part of the debugger, but diagnostics will help you avoid bugs before you run your code.

You should read this blog post and this chapter of Advanced R, 2nd Ed. for more information on the various debugging methods.

Console

When the browser() function is called, the Console enters the ‘reactive browser environment,’ tells me where the debugging function was called from, and changes the prompt to Browse[1]>:

Called from: check_pkg_ns("fs")
Browse[1]> 

I can use the Console to inspect variables and ‘step through’ the function code.

(a) Debug mode in Console
Figure 3: Debug mode with browser() in Console

The debugger toolbar is also placed at the top of the Console:

(a) Debug toolbar Console
Figure 4: Debug toolbar in Console

I can use the toolbar or enter the following commands in the Console:

  • n (next): execute the next step in the function

  • s (step into): step into the function call on the current line

  • c (continue): continue normal execution without stepping

  • f (finish): execute the rest of the current loop or function

  • Q (Quit): quit the debugger

I’ll return to the Console in a bit (this is where most of the debugging is done), but let’s view the other changes to the IDE first.

Source

In the Source pane, we can see the line with browser() has been highlighted with an arrow:

(a) Debug mode in Source
Figure 5: Debug mode with browser() in Source

The Source pane will continually update and highlight my execution position (i.e., what’s going to be executed next) as I ‘step through’ the code.

*After we’ve finished debugging, it’s important to remember to remove the browser() function so it isn’t triggered the next time it is executed.

Environment

The (Environment) pane is changed from the global environment to the environment of the function that’s currently being executed in the Console:

(a) Debug mode in Environment
Figure 6: Debug mode with browser() in Environment

In the case of check_pkg_ns(), I can see the Values section contains the pkg ("fs") and quiet (FALSE) arguments.

Other environments

The drop-down list of environments above the Values is arranged in reverse hierarchical order: The Global Environment is listed under the drop-down list, but it’s above the check_pkg_ns() environment in the search path:

(a) Items in Environment debug mode
Figure 7: Environments with debugger

Traceback

The traceback (or ‘call stack’) is the ‘stack’ of functions that have been run thus far:

(a) Traceback in Environment
Figure 8: Environment Traceback viewer

Clicking on an item in traceback will display the environment contents in the function’s code. Right now, it includes the call to source("R/check_pkg_ns.R"), and ‘Debug source’ call to check_pkg_ns("fs").

If the Show internals option is selected, the internal functions are shown (slightly subdued in gray).

(a) Traceback internals
Figure 9: Traceback internals

Arguments

The pkg argument can be printed to verify it’s contents.

Browse[1]> pkg
[1] "fs"

The debugger lets me view the state of a function’s values or variables at each execution step, which helps me understand any incorrect or unexpected values.

Based on the help files and my experiments, check_pkg_ns() should be looking through the namespace to see if a pkg is loaded; if it isn’t, that pkg is loaded in the namespace.

I can also check the code from the mini experiments inside the debugger Console to see if the fs namespace has been loaded:

Browse[1]> isNamespaceLoaded("fs")
[1] FALSE

At my current location in check_pkg_ns(), the fs package hasn’t been loaded.

Stepping through

I can begin ‘stepping through’ check_pkg_ns() by entering n in the Console:

Browse[1]> n

Notice after the entering n in the Console, the debugger tells me where the browser() function has paused execution (debug at /path/to/function/file.R), the line number (#27), and the check_pkg_ns() function is printed to console (I’ve omitted it here):

Browse[1]> n
debug at ~/projects/apps/dbap/R/check_pkg_ns.R#27:

<...check_pkg_ns() function...>

Browse[2]>

The prompt also changes from Browse[1]> to Browse[2]> to let me know I’m inside the check_pkg_ns() function.

I’ll use n (or Next) to continue following the path pkg takes through the function:

(a) Use Console to step through function
Figure 10: Use n to step through check_pkg_ns()

When I land on the line after the call to requireNamespace(), I can check to see if the fs namespace has been loaded with isNamespaceLoaded("fs")

Browse[2]> isNamespaceLoaded("fs")
[1] TRUE

Inspect values

Now that I’ve confirmed check_pkg_ns() works with fs, I should also confirm it works with a development package (i.e., not on CRAN). I can test this with the roxygen2Comment package–it contains an addin for pasting roxygen2 comment blocks.

To quit debug mode, I can enter Q in the Console or click on the red square (Stop) icon in the toolbar.

Browse[2]> Q

I’ll confirm roxygen2Comment is not loaded with isNamespaceLoaded(), then change the pkg argument in check_pkg_ns() and re-run the function

isNamespaceLoaded("roxygen2Comment")
[1] FALSE
> check_pkg_ns("roxygen2Comment")
Called from: check_pkg_ns("roxygen2Comment")
Browse[1]> 

This time, when I step through check_pkg_ns(), I notice pkg takes an alternative path:

(a) Alternative path through function
Figure 11: Development package in check_pkg_ns()

When the Source pane highlights the stop() function, I can check to confirm this package wasn’t loaded:

Browse[2]> isNamespaceLoaded("roxygen2Comment")
[1] FALSE

If I enter n one more time in the Console, I see the stop() error from the function is returned:

Browse[2]> n
Error in check_pkg_ns("roxygen2Comment") : 
  roxygen2Comment not available

I’ll perform one last check on check_pkg_ns(): what if I want to pass multiple packages to pkg? I’ll check this with fs and box.

# First make sure these aren't loaded...
unloadNamespace("fs")
unloadNamespace("box")
# Now combine into vector
pkgs <- c("fs", "box")
check_pkg_ns(pkgs)

After entering debug mode, I want to proceed to the control flow and verify the pkgs variable:

> check_pkg_ns(pkgs)
Called from: check_pkg_ns(pkgs)
Browse[1]> n
Browse[2]> pkgs
[1] "fs"  "box"

This confirms both packages are in the pkg variable. If I use n to proceed through to end of check_pkg_ns(), I see the final line returns the successful loading message twice:

Browse[2]> n
Loading package: fs
Loading package: box

browser() recap

Once execution is paused with browser(), using the n command in the Console (or in the debugging toolbar at the top-right of the pane) lets me step through the code line-by-line.

(a) Step through/over
Figure 12: Step through/over code

This allows me to inspect the state of the variables at various points within a function.

Nested functions

The check_pkg_ns() function is fairly basic in that it performs a single ‘unit of work’ (i.e., check if add-on packages package have been loaded and attached; if not, load and attach them). When functions become more complex, it’s more efficient to use nested functions–i.e., functions within other functions–which allow me to execute multiple commands simultaneously.

An example of this is the pkg_data_results() function below:

pkg_data_results()
pkg_data_results("dplyr")
## # A tibble: 5 × 3
##   Package Item              Title              
##   <chr>   <chr>             <chr>              
## 1 dplyr   band_instruments  Band membership    
## 2 dplyr   band_instruments2 Band membership    
## 3 dplyr   band_members      Band membership    
## 4 dplyr   starwars          Starwars characters
## 5 dplyr   storms            Storm tracks data

pkg_data_results() returns a data.frame with three columns: Package, Item, and Title.

The output from pkg_data_results() comes from the data(package = "pkg") output:

(a) Output from data(package = )
Figure 13: data(package = "dplyr")

This output is normally opened in a separate window, but it’s created as a matrix.

structure of data(package =)
str(data(package = "dplyr"))
## List of 4
##  $ title  : chr "Data sets"
##  $ header : NULL
##  $ results: chr [1:5, 1:4] "dplyr" "dplyr" "dplyr" "dplyr" ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:4] "Package" "LibPath" "Item" "Title"
##  $ footer : NULL
##  - attr(*, "class")= chr "packageIQR"

pkg_data_results() converts the matrix output into a data.frame three columns in (Package, Title, Item).

I’ve placed browser() at the top of pkg_data_results() and run it with the fivethirtyeight package.

pkg_data_results("fivethirtyeight")
(a) browser() in pkg_data_results(“fivethirtyeight”)
Figure 14: browser() in pkg_data_results("fivethirtyeight")

Step into

When the debugger lands on check_pkg_ns(), I can follow the fivethirtyeight package through this function by ‘stepping into’ this function by entering s in the Console (or the toolbar icon):

Step into a function
(a) browser() in pkg_data_results(“fivethirtyeight”)
Figure 15: Step into in pkg_data_results("fivethirtyeight")

Debugging ‘at’ vs ‘in’

In the Console, there are now debugging in and debug at locations:

Browse[2]> s
debugging in: check_pkg_ns(pkg = pkg, quiet = TRUE)
debug at /apps/dbap/R/check_pkg_ns.R#25: 

The debug at location is the the we location of the initial call to browser(), and debugging in is the function I stepped into.

The prompt has also changed from Browse[2]> to Browse[3]>:

Browse[3]>
(a) Use s to step into check_pkg_ns()
Figure 16: Use s to step through check_pkg_ns()

The R/check_pkg_ns.R file will open with the highlighted function. I can proceed through check_pkg_ns() using n until I reach requireNamespace():

(a) Use n to step through check_pkg_ns()
Figure 17: Use n to step through check_pkg_ns()

When I reach the final line in check_pkg_ns(), I can use either method below verify the pkg namespace is loaded:

Browse[3]> pkg %in% loadedNamespaces()
[1] TRUE
Browse[3]> isNamespaceLoaded(pkg)
[1] TRUE

After the last line of check_pkg_ns() has been evaluated, the debugger will automatically return to the pkg_data_results() function. The Source pane will highlight the final step (and the prompt returns to Browse[2]>):

(a) Step into/through check_pkg_ns() from pkg_data_results()
Figure 18: Step into and through check_pkg_ns() from pkg_data_results()

A final n command in the Console will return the output table:

Browse[2]> n
### A tibble: 129 × 3
##    Package         Item                Title
##    <chr>           <chr>               <chr>
##  1 fivethirtyeight US_births_1994_2003 Some People Are Too Superstitious To …
##  2 fivethirtyeight US_births_2000_2014 Some People Are Too Superstitious To …
##  3 fivethirtyeight ahca_polls          American Health Care Act Polls
##  4 fivethirtyeight airline_safety      Should Travelers Avoid Flying Airline…
##  5 fivethirtyeight antiquities_act     Trump Might Be The First President To…
##  6 fivethirtyeight august_senate_polls How Much Trouble Is Ted Cruz Really  …
##  7 fivethirtyeight avengers            Joining The Avengers Is As Deadly As
##  8 fivethirtyeight bachelorette        Bachelorette / Bachelor
##  9 fivethirtyeight bad_drivers         Dear Mona, Which State Has The Worst …
## 10 fivethirtyeight bechdel             The Dollar-And-Cents Case Against Hol…
## # ℹ 119 more rows
## # ℹ Use `print(n = ...)` to see more rows

Put it all together

The initial pkg_data_str() function for returning a table of ‘package data structures’ is below.

expand to see initial pkg_data_str()
pkg_data_str <- function(pkg) {

  data_results <- pkg_data_results(pkg = pkg)

  ds_list <- purrr::map2(
    .x = data_results[["Item"]], 
    .y = data_results[["Package"]],
    .f = pkg_data_object, .progress = TRUE
  )

  cols_tbl <- dplyr::mutate(data_results,
    Class = purrr::map(.x = ds_list, .f = class) |>
      purrr::map(paste0, collapse = ", ") |> unlist(),
    Columns = purrr::map(.x = ds_list, .f = ncol) |>
      purrr::map(paste0, " columns") |> unlist(),
    Rows = purrr::map(.x = ds_list, .f = nrow) |>
      purrr::map(paste0, " rows") |> unlist(),
    Logical = purrr::map(
      .x = ds_list,
      .f = col_type_count, "log"
    ) |> unlist(),
    Numeric = purrr::map(
      .x = ds_list,
      .f = col_type_count, "num"
    ) |> unlist(),
    Character = purrr::map(
      .x = ds_list,
      .f = col_type_count, "chr"
    ) |> unlist(),
    Factor = purrr::map(
      .x = ds_list,
      .f = col_type_count, "fct"
    ) |> unlist(),
    List = purrr::map(
      .x = ds_list,
      .f = col_type_count, "lst"
    ) |> unlist(),
  )

  pkg_tbls_dfs <- dplyr::filter(cols_tbl,
    stringr::str_detect(Class, "data.frame")
  )

  return(pkg_tbls_dfs)
}

pkg_data_str() uses nested functions to create the following intermediate objects I can check while developing with browser() (the example below uses the forcats package)

Data results

The output from pkg_data_results() is stored in data_results:

data_results <- pkg_data_results(pkg = pkg)
Browse[2]> data_results
# A tibble: 1 × 3
  Package Item    Title                                                           
  <chr>   <chr>   <chr>                                                           
1 forcats gss_cat A sample of categorical variables from the General Social su...

Package data objects

After extracting the Package, Title, and Type columns from pkg_data_results(), I use purrr:map2() to iterate over each Item and Package, which builds a list of datasets (ds_list). The .f argument is a nested pkg_data_object() function, which calls base::get().

ds_list <- purrr::map2(
  .x = data_results[["Item"]],
  .y = data_results[["Package"]],
  .f = pkg_data_object, .progress = TRUE
)

I’ll view the contents of the list with str()

Browse[2]> str(ds_list)
List of 1
 $ : tibble [21,483 × 9] (S3: tbl_df/tbl/data.frame)
  ..$ year   : int [1:21483] 2000 2000 2000 2000 2000 2000 2000 2000 ...
  ..$ marital: Factor w/ 6 levels "No answer","Never married",..: 2 4 ...
  ..$ age    : int [1:21483] 26 48 67 39 25 25 36 44 44 47 ...
  ..$ race   : Factor w/ 4 levels "Other","Black",..: 3 3 3 3 3 3 3 3 3 3 ...
  ..$ rincome: Factor w/ 16 levels "No answer","Don't know",..: 8 8 16 16 ...
  ..$ partyid: Factor w/ 10 levels "No answer","Don't know",..: 6 5 7 6  ...
  ..$ relig  : Factor w/ 16 levels "No answer","Don't know",..: 15 15 15 ...
  ..$ denom  : Factor w/ 30 levels "No answer","Don't know",..: 25 23 3 ...
  ..$ tvhours: int [1:21483] 12 NA 2 4 1 NA 3 NA 0 3 ...

Column counts

The ds_list created above is used to add the Class, Columns, and Rows columns to data_results using the class(), ncol(), nrow(). The column counts are added with the col_type_count() function.

  cols_tbl <- dplyr::mutate(data_results,
    Class = purrr::map(.x = ds_list, .f = class) |>
      purrr::map(paste0, collapse = ", ") |> unlist(),
    Columns = purrr::map(.x = ds_list, .f = ncol) |>
      purrr::map(paste0, " columns") |> unlist(),
    Rows = purrr::map(.x = ds_list, .f = nrow) |>
      purrr::map(paste0, " rows") |> unlist(),
    Logical = purrr::map(
      .x = ds_list,
      .f = col_type_count, "log"
    ) |> unlist(),
    Numeric = purrr::map(
      .x = ds_list,
      .f = col_type_count, "num"
    ) |> unlist(),
    Character = purrr::map(
      .x = ds_list,
      .f = col_type_count, "chr"
    ) |> unlist(),
    Factor = purrr::map(
      .x = ds_list,
      .f = col_type_count, "fct"
    ) |> unlist(),
    List = purrr::map(
      .x = ds_list,
      .f = col_type_count, "lst"
    ) |> unlist(),
  )
Browse[2]> cols_tbl
# A tibble: 1 × 11
  Package Item    Title          Class Columns Rows  Logical Numeric Character Factor  List
  <chr>   <chr>   <chr>          <chr> <chr>   <chr>   <int>   <int>     <int>  <int> <int>
1 forcats gss_cat A sample of c… tbl_… 9 colu… 2148…       0       3         0      6     0

Rectangular objects

Finally, cols_tbl is filtered to only those objects with a class() containing the string ‘data.frame’.

pkg_tbls_dfs <- dplyr::filter(.data = cols_tbl,
                  stringr::str_detect(Class, "data.frame"))

This is exactly the same as the previous tibble because forcats has only one data object (gss_cat), and it’s a tibble:

Browse[2]> pkg_tbls_dfs
# A tibble: 1 × 11
  Package Item    Title          Class Columns Rows  Logical Numeric Character Factor  List
  <chr>   <chr>   <chr>          <chr> <chr>   <chr>   <int>   <int>     <int>  <int> <int>
1 forcats gss_cat A sample of c… tbl_… 9 colu… 2148…       0       3         0      6     0

I’m explicitly returning pkg_tbls_dfs to view it in the debugger. When I’m confident it’s behaving as expected, I’ll remove this final object and ‘rely on R to return the result of the last evaluated expression.’

Error!

When I tried using the initial pkg_data_str() with a package that had zero data objects (fs), I get the following error:

pkg_data_str("fs")
Error in `dplyr::filter()` at dbap/R/pkg_data_str.R:78:2:
ℹ In argument: `stringr::str_detect(Class, "data.frame")`.
Caused by error in `vctrs::vec_size_common()`:
! object 'Class' not found
Run `rlang::last_trace()` to see where the error occurred.

In the debugger, I was able to pinpoint the source of this error (and the underlying condition causing it to occur).

Replicate the error

The browser() beings at the top of pkg_data_str(), where I’ll step into pkg_data_results()

(a) pkg_data_results() from pkg_data_str()
Figure 19: Step into pkg_data_results() from pkg_data_str()

When I’m inside pkg_data_results(), I’ll use n to verify the fs package namespace was loaded and the tibble was created:

(a) Step through pkg_data_results()
Figure 20: Step through pkg_data_results() (from pkg_data_str())

Back in pkg_data_str(), the output from pkg_data_results() is stored as data_results. I can check the contents of data_results in the Console.

Browse[2]> data_results
# A tibble: 0 × 3
# ℹ 3 variables: Package <chr>, Item <chr>, Title <chr>

I see it’s empty. An empty data_results results in an empty list output from purrr::map2():

(a) Step out of pkg_data_results()
Figure 21: Step out of pkg_data_results() back into pkg_data_str()
Browse[2]> ds_list
list()

The empty ds_list results in dplyr::mutate() being unable to create the Class column in cols_tbl:

(a) dplyr::mutate() call in get_ds_strs()
Figure 22: Create Class column in get_ds_strs()
Browse[2]> cols_tbl
# A tibble: 0 × 3
# ℹ 3 variables: Package <chr>, Item <chr>, Title <chr>

Which triggers the error in dplyr::filter()

Browse[2]> n
Error in `dplyr::filter()` at dbap/R/get_ds_str.R:60:2:
ℹ In argument: `stringr::str_detect(Class, "data.frame")`.
Caused by error in `vctrs::vec_size_common()`:
! object 'Class' not found
Run `rlang::last_trace()` to see where the error occurred.

The full path for the fs package through the initial get_ds_str() is outlined in the figure below:

(a) Error path in get_ds_strs()
Figure 23: Replicate the error from get_ds_strs()

Solution

To fix this error, I had to make some changes to both pkg_data_results() and pkg_data_str():

In pkg_data_results(), I added control flow to return a tibble of logical columns (all NA) if the package doesn’t have any data objects:

Expand to view the updated pkg_data_results()
pkg_data_results <- function(pkg) {
  # load packages
  check_pkg_ns(pkg = pkg, quiet = TRUE)

  results <- tibble::as_tibble(
    data.frame(
      Package = data(package = pkg)$results[, "Package"],
      Item = data(package = pkg)$results[, "Item"],
      Title = data(package = pkg)$results[, "Title"],
      stringsAsFactors = FALSE,
      check.names = FALSE,
      row.names = NULL
    )
  )

  if (nrow(results) == 0) {

  data_results <- tibble::as_tibble(
    data.frame(
        matrix(
            nrow = 1, ncol = 11,
            byrow = TRUE,
            dimnames = list(NULL,
              c("Package", "Item", "Title",
                "Class", "Columns", "Rows",
                "Logical", "Numeric", 
                "Character", "Factor", 
                "List"))
                ),
        row.names = NULL))

    return(data_results)


  } else {

    results

  }

}

In pkg_data_str(), I added two if statements:

  • the first if statement identifies the logical NA columns (indicating the results from data(package = pkg) didn’t have any data objects)

  • the second if statement creates the Class column first, then filters the rows to only those containing a data.frame string pattern. If none of the data objects have the data.frame string pattern in their class, an empty data_results table is returned

Expand to view the updated pkg_data_str()
pkg_data_str <- function(pkg) {
  
  data_results <- pkg_data_results(pkg = pkg)
  
  if (!is.logical(data_results[["Item"]])) {
    # data_results contains data objects
    ds_list <- purrr::map2(
      .x = data_results[["Item"]], 
      .y = data_results[["Package"]],
      .f = pkg_data_object, .progress = TRUE
    )

    class_tbl <- dplyr::mutate(data_results,
      Class = purrr::map(.x = ds_list, .f = class) |>
        purrr::map(paste0, collapse = ", ") |> unlist()
    )

    df_tbl <- dplyr::filter(
      class_tbl,
      stringr::str_detect(Class, "data.frame")
    )

    if (nrow(df_tbl) == 0) {
      # df_tbl does not contain 'data.frame' classes
      data_results <- tibble::as_tibble(
        data.frame(
          matrix(
            nrow = 1, ncol = 11,
            byrow = TRUE,
            dimnames = list(
              NULL,
              c(
                "Package", "Item", "Title",
                "Class", "Columns", "Rows",
                "Logical", "Numeric", "Character",
                "Factor", "List"
              )
            )
          ),
          row.names = NULL
        )
      )

      return(data_results)
      
    } else {
      
      # df_tbl contains 'data.frame' classes
      dplyr::mutate(df_tbl,
        Columns = purrr::map(.x = ds_list, .f = ncol) |>
          purrr::map(paste0, " columns") |> unlist(),
        Rows = purrr::map(.x = ds_list, .f = nrow) |>
          purrr::map(paste0, " rows") |> unlist(),
        Logical = purrr::map(
          .x = ds_list,
          .f = col_type_count, "log") |> unlist(),
        Numeric = purrr::map(
          .x = ds_list,
          .f = col_type_count, "num") |> unlist(),
        Character = purrr::map(
          .x = ds_list,
          .f = col_type_count, "chr") |> unlist(),
        Factor = purrr::map(
          .x = ds_list,
          .f = col_type_count, "fct") |> unlist(),
        List = purrr::map(
          .x = ds_list,
          .f = col_type_count, "lst") |> unlist())
      
    }
    
  } else {
    
    # data_results does not contains data objects
    return(data_results)
    
  }
  
}

Rather than go through the debugger process again, I’ll go through each of the the mini experiments I used to check the updated pkg_data_results() and pkg_data_str() functions:

  • Check single package without any data objects (box)

    knitr::kable(
      pkg_data_str("box"))
    Package Item Title Class Columns Rows Logical Numeric Character Factor List
    NA NA NA NA NA NA NA NA NA NA NA
  • Check single package with data objects, but none with classes that contain data.frame (stringr)

    knitr::kable(
    pkg_data_str("stringr"))
    Package Item Title Class Columns Rows Logical Numeric Character Factor List
    NA NA NA NA NA NA NA NA NA NA NA
  • Check single package with multiple data objects (dplyr)

    knitr::kable(
    pkg_data_str("dplyr"))
    Package Item Title Class Columns Rows Logical Numeric Character Factor List
    dplyr band_instruments Band membership tbl_df, tbl, data.frame 2 columns 3 rows 0 0 2 0 0
    dplyr band_instruments2 Band membership tbl_df, tbl, data.frame 2 columns 3 rows 0 0 2 0 0
    dplyr band_members Band membership tbl_df, tbl, data.frame 2 columns 3 rows 0 0 2 0 0
    dplyr starwars Starwars characters tbl_df, tbl, data.frame 14 columns 87 rows 0 3 8 0 3
    dplyr storms Storm tracks data tbl_df, tbl, data.frame 13 columns 19537 rows 0 11 1 1 0
  • Check multiple packages with multiple data objects (dplyr, forcats and lubridate)

    knitr::kable(
    pkg_data_str(c("dplyr", "forcats", "lubridate")))
    Package Item Title Class Columns Rows Logical Numeric Character Factor List
    forcats gss_cat A sample of categorical variables from the General Social survey tbl_df, tbl, data.frame 9 columns 21483 rows 0 3 0 6 0
    lubridate lakers Lakers 2008-2009 basketball data set data.frame 13 columns 34624 rows 0 5 8 0 0
    dplyr band_instruments Band membership tbl_df, tbl, data.frame 2 columns 3 rows 0 0 2 0 0
    dplyr band_instruments2 Band membership tbl_df, tbl, data.frame 2 columns 3 rows 0 0 2 0 0
    dplyr band_members Band membership tbl_df, tbl, data.frame 2 columns 3 rows 0 0 2 0 0
    dplyr starwars Starwars characters tbl_df, tbl, data.frame 14 columns 87 rows 0 3 8 0 3
    dplyr storms Storm tracks data tbl_df, tbl, data.frame 13 columns 19537 rows 0 11 1 1 0

Recap

RStudio’s debugger is a powerful tool that can save tons of time when you’re developing new functions, discovering how a function’s code is executed, or dealing with errors. When you’ve finished debugging, remember to remove the browser() call from your function.

The steps above should help get you started, and if you’d like to learn more, check out the debugging chapter of Advanced R, and the documentation for browser(), debug()/debugonce()/undebug(), and traceback() functions.