BDD functions: use case • pickler

library(pickler)
library(testthat)

Behavior-driven development functions

testthat‘s BDD functions (specifically describe()) allow us to include additional context for tests directly preceding the call to it() or test_that(). TRUE/FALSE results are at the heart of all unit tests (i.e., “do the results meet my expectations?” Check ’yes’ or ‘no’).

it()

It’s a little unorthodox, but we can compare the output of passing it() and test_that() tests using waldo::compare() (the underlying package used in functions like expect_equal()).

waldo::compare(
  test_that("make sure we did the thing right", {
    expect_equal(object = TRUE, expected = TRUE)
  }), 
    it("make sure we did the thing right", {
      expect_equal(object = TRUE, expected = TRUE)
  })
)
#> Test passed 
#> Test passed
#> ✔ No differences

These two functions have no notably different behaviors. The ‘secret sauce’ of test_that() and it() is the ability to run expect_*() functions and return the logical (TRUE/pass or FALSE/fail).

unclass(
  test_that("make sure we did the thing right", {
    expect_equal(object = TRUE, expected = TRUE)
  })
)
#> Test passed
#> [1] TRUE
unclass(
    it("make sure we did the thing right", {
      expect_equal(object = TRUE, expected = TRUE)
  })
)
#> Test passed
#> [1] TRUE

If we attempt to unclass() the results of a failing test, we see that the returned FALSE value from expect_equal() triggers an error (and both actual and expected values are sent off to waldo::compare()):

unclass(
  test_that("make sure we did the thing right", {
    expect_equal(object = FALSE, expected = TRUE)
  })
)
#> -- Failure: make sure we did the thing right -----------------------------------
#> FALSE (`actual`) not equal to TRUE (`expected`).
#> 
#> `actual`:   FALSE
#> `expected`: TRUE
#> Error:
#> ! Test failed
unclass(
    it("make sure we did the thing right", {
      expect_equal(object = FALSE, expected = TRUE)
  })
)
#> -- Failure: make sure we did the thing right -----------------------------------
#> FALSE (`actual`) not equal to TRUE (`expected`).
#> 
#> `actual`:   FALSE
#> `expected`: TRUE
#> Error:
#> ! Test failed

describe()

However, what happens if we wrap the same tests in describe()? The returned value for a passing test a message and a NULL value, but failing tests return identical output to tests that aren’t wrapped in describe():

# describe() + test_that() passing test 
unclass(
  describe("verify we implemented the right thing", {
    test_that("make sure we did the thing right", {
      expect_equal(object = TRUE, expected = TRUE)
    })
  })
)
#> Test passed
#> NULL
# describe() + test_that() failing test 
unclass(
  describe("verify we implemented the right thing", {
    test_that("make sure we did the thing right", {
      expect_equal(object = FALSE, expected = TRUE)
    })
  })
)
#> -- Failure: make sure we did the thing right -----------------------------------
#> FALSE (`actual`) not equal to TRUE (`expected`).
#> 
#> `actual`:   FALSE
#> `expected`: TRUE
#> Error:
#> ! Test failed

We can see describe() returns a NULL value instead of the logical comparison from expect_equal() (but it still returns the results of the failed test).

Use case: expect_snapshot()

I’ll use a known issue with expect_snapshot() and test_that() to demonstrate how describe() saves the day. Assume we have a function that generates output too complex to test with expect_equal():

pickler_logo()
#> 
#>              d8b          888      888
#>              Y8P          888      888
#>                           888      888
#>     88888b.  888  .d8888b 888  888 888  .d88b.  888d888
#>     888 "88b 888 d88P"    888 .88P 888 d8P  Y8b 888P"
#>     888  888 888 888      888888K  888 88888888 888
#>     888 d88P 888 Y88b.    888 "88b 888 Y8b.     888
#>     88888P"  888  "Y8888P 888  888 888  "Y8888  888
#>     888
#>     888
#>     888
#>

We don’t want to visually inspect differences in pickler_logo(), so we’ll use a snapshot test to capture a baseline for comparison. Let’s start with test_that()

Initial test_that() snapshot test

I’ve included a feature description in the desc argument for pickler_logo():

test_that("
  Feature: Text-based Logo Generation
  As a user who calls the pickler_logo() function
  I want to generate a text-based logo
  So that I can quickly insert the pickler logo", code = {
        expect_snapshot(pickler_logo())
      })

devtools::test_active_file()

After the initial call to test_active_file(), testthat tells us the location of the snapshot file is tests/testthat/_snaps/:

# Feature: Text-based Logo Generation
  As a user who calls the pickler_logo() function
  I want to generate a text-based logo
  So that I can quickly insert the pickler logo

    Code
      pickler_logo()
    Output
      
                   d8b          888      888
                   Y8P          888      888
                                888      888
          88888b.  888  .d8888b 888  888 888  .d88b.  888d888
          888 "88b 888 d88P"    888 .88P 888 d8P  Y8b 888P"
          888  888 888 888      888888K  888 88888888 888
          888 d88P 888 Y88b.    888 "88b 888 Y8b.     888
          88888P"  888  "Y8888P 888  888 888  "Y8888  888
          888
          888
          888

Subsequent test_that() test runs

Later, we change pickler_logo() by replacing the \" with '.¹ After loading the changes, we notice that when we run the test again a new snapshot file is generated (and updated with the a new version of the logo):

# Feature: Text-based Logo Generation
  As a user who calls the pickler_logo() function
  I want to generate a text-based logo
  So that I can quickly insert the pickler logo

    Code
      pickler_logo()
    Output
      
                   d8b          888      888
                   Y8P          888      888
                                888      888
          88888b.  888  .d8888b 888  888 888  .d88b.  888d888
          888 '88b 888 d88P'    888 .88P 888 d8P  Y8b 888P'
          888  888 888 888      888888K  888 88888888 888
          888 d88P 888 Y88b.    888 '88b 888 Y8b.     888
          88888P'  888  'Y8888P 888  888 888  'Y8888  888
          888
          888
          888

We can compare these outputs directly by saving each markdown file in tests/testthat/fixtures/ and reading them into a test with test_path():

#> ../tests/testthat/fixtures
#> ├── make-pickler_logo_snap.R
#> ├── pickler_logo2_snap.md
#> └── pickler_logo_snap.md

waldo::compare(
  x = readLines(con = test_path("fixtures", "pickler_logo_snap.md")),
  y = readLines(con = test_path("fixtures", "pickler_logo2_snap.md"))
)

This shows the changes but includes the escape characters from readLines():

old[3:12] vs new[3:12]
  "                   Y8P          888      888"
  "                                888      888"
  "          88888b.  888  .d8888b 888  888 888  .d88b.  888d888"
- "          888 \"88b 888 d88P\"    888 .88P 888 d8P  Y8b 888P\""
+ "          888 '88b 888 d88P'    888 .88P 888 d8P  Y8b 888P'"
  "          888  888 888 888      888888K  888 88888888 888"
- "          888 d88P 888 Y88b.    888 \"88b 888 Y8b.     888"
+ "          888 d88P 888 Y88b.    888 '88b 888 Y8b.     888"
- "          88888P\"  888  \"Y8888P 888  888 888  \"Y8888  888"
+ "          88888P'  888  'Y8888P 888  888 888  'Y8888  888"
  "          888"
  "          888"
  "          888"

However, if the contents of desc are placed in the description argument of describe(), we see testthat detects the changes and asks us to review the snapshots.²

describe(
  "Feature: Text-based Logo Generation
    As a user who calls the text_logo() function
    I want to generate a text-based logo
    So that I can quickly insert the pickler logo", code = {
      test_that("pickler_logo()", code = {
        expect_snapshot(pickler_logo())
      })
})

devtools::test_active_file()

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

── Failure (test-pickler_logo.R:8:9): pickler_logo ─────────────────────────────
Snapshot of code has changed:
old[6:15] vs new[6:15]
                 Y8P          888      888
                              888      888
        88888b.  888  .d8888b 888  888 888  .d88b.  888d888
-       888 "88b 888 d88P"    888 .88P 888 d8P  Y8b 888P"
+       888 '88b 888 d88P'    888 .88P 888 d8P  Y8b 888P'
        888  888 888 888      888888K  888 88888888 888
-       888 d88P 888 Y88b.    888 "88b 888 Y8b.     888
+       888 d88P 888 Y88b.    888 '88b 888 Y8b.     888
-       88888P"  888  "Y8888P 888  888 888  "Y8888  888
+       88888P'  888  'Y8888P 888  888 888  'Y8888  888
        888
        888
        888

* Run testthat::snapshot_accept('pickler_logo') to accept the change.
* Run testthat::snapshot_review('pickler_logo') to interactively review the change.
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

This is an example of how describe() gives us the ability to include context for tests (while keeping the scope confined to the tests within each describe() call).

BDD Recap

Multiline descriptions in the desc argument of test_that() can cause issues when used wtth expect_snapshot(), but wrapping multiple lines in a call todescribe() prevents the test from overwriting the baseline snapshot file.

The functionality provided by describe() is an excellent improvement to testthat tests, because we’re no longer confined to comments and the desc argument of test_that() to document what exactly the test is testing.