Behavior-driven development functions
testthat
‘s BDD functions (specifically
describe()
) allow us to include additional context for
tests directly preceding the call to it()
or
test_that()
. TRUE
/FALSE
results
are at the heart of all unit tests (i.e., “do the results meet my
expectations?” Check ’yes’ or ‘no’).
it()
It’s a little unorthodox, but we can compare the output of passing
it()
and test_that()
tests using
waldo::compare()
(the underlying package used in functions
like expect_equal()
).
waldo::compare(
test_that("make sure we did the thing right", {
expect_equal(object = TRUE, expected = TRUE)
}),
it("make sure we did the thing right", {
expect_equal(object = TRUE, expected = TRUE)
})
)
#> Test passed
#> Test passed
#> ✔ No differences
These two functions have no notably different behaviors. The ‘secret
sauce’ of test_that()
and it()
is the ability
to run expect_*()
functions and return the logical
(TRUE
/pass or FALSE
/fail).
unclass(
test_that("make sure we did the thing right", {
expect_equal(object = TRUE, expected = TRUE)
})
)
#> Test passed
#> [1] TRUE
unclass(
it("make sure we did the thing right", {
expect_equal(object = TRUE, expected = TRUE)
})
)
#> Test passed
#> [1] TRUE
If we attempt to unclass()
the results of a failing
test, we see that the returned FALSE
value from
expect_equal()
triggers an error (and both
actual
and expected
values are sent off to
waldo::compare()
):
unclass(
test_that("make sure we did the thing right", {
expect_equal(object = FALSE, expected = TRUE)
})
)
#> -- Failure: make sure we did the thing right -----------------------------------
#> FALSE (`actual`) not equal to TRUE (`expected`).
#>
#> `actual`: FALSE
#> `expected`: TRUE
#> Error:
#> ! Test failed
unclass(
it("make sure we did the thing right", {
expect_equal(object = FALSE, expected = TRUE)
})
)
#> -- Failure: make sure we did the thing right -----------------------------------
#> FALSE (`actual`) not equal to TRUE (`expected`).
#>
#> `actual`: FALSE
#> `expected`: TRUE
#> Error:
#> ! Test failed
describe()
However, what happens if we wrap the same tests in
describe()
? The returned value for a passing test a message
and a NULL
value, but failing tests return identical output
to tests that aren’t wrapped in describe()
:
# describe() + test_that() passing test
unclass(
describe("verify we implemented the right thing", {
test_that("make sure we did the thing right", {
expect_equal(object = TRUE, expected = TRUE)
})
})
)
#> Test passed
#> NULL
# describe() + test_that() failing test
unclass(
describe("verify we implemented the right thing", {
test_that("make sure we did the thing right", {
expect_equal(object = FALSE, expected = TRUE)
})
})
)
#> -- Failure: make sure we did the thing right -----------------------------------
#> FALSE (`actual`) not equal to TRUE (`expected`).
#>
#> `actual`: FALSE
#> `expected`: TRUE
#> Error:
#> ! Test failed
We can see describe()
returns a NULL
value
instead of the logical comparison from expect_equal()
(but
it still returns the results of the failed test).
Use case: expect_snapshot()
I’ll use a known issue with expect_snapshot()
and
test_that()
to demonstrate how describe()
saves the day. Assume we have a function that generates output too
complex to test with expect_equal()
:
pickler_logo()
#>
#> d8b 888 888
#> Y8P 888 888
#> 888 888
#> 88888b. 888 .d8888b 888 888 888 .d88b. 888d888
#> 888 "88b 888 d88P" 888 .88P 888 d8P Y8b 888P"
#> 888 888 888 888 888888K 888 88888888 888
#> 888 d88P 888 Y88b. 888 "88b 888 Y8b. 888
#> 88888P" 888 "Y8888P 888 888 888 "Y8888 888
#> 888
#> 888
#> 888
#>
We don’t want to visually inspect differences in
pickler_logo()
, so we’ll use a snapshot test to capture a
baseline for comparison. Let’s start with test_that()
Initial test_that() snapshot test
I’ve included a feature description in the desc
argument
for pickler_logo()
:
test_that("
Feature: Text-based Logo Generation
As a user who calls the pickler_logo() function
I want to generate a text-based logo
So that I can quickly insert the pickler logo", code = {
expect_snapshot(pickler_logo())
})
devtools::test_active_file()
After the initial call to test_active_file()
,
testthat
tells us the location of the snapshot file is
tests/testthat/_snaps/
:
# Feature: Text-based Logo Generation
As a user who calls the pickler_logo() function
I want to generate a text-based logo
So that I can quickly insert the pickler logo
Code
pickler_logo()
Output
d8b 888 888
Y8P 888 888
888 888
88888b. 888 .d8888b 888 888 888 .d88b. 888d888
888 "88b 888 d88P" 888 .88P 888 d8P Y8b 888P"
888 888 888 888 888888K 888 88888888 888
888 d88P 888 Y88b. 888 "88b 888 Y8b. 888
88888P" 888 "Y8888P 888 888 888 "Y8888 888
888
888
888
Subsequent test_that() test runs
Later, we change pickler_logo()
by replacing the
\"
with '
.1 After loading the changes, we notice that
when we run the test again a new snapshot file is generated
(and updated with the a new version of the logo):
# Feature: Text-based Logo Generation
As a user who calls the pickler_logo() function
I want to generate a text-based logo
So that I can quickly insert the pickler logo
Code
pickler_logo()
Output
d8b 888 888
Y8P 888 888
888 888
88888b. 888 .d8888b 888 888 888 .d88b. 888d888
888 '88b 888 d88P' 888 .88P 888 d8P Y8b 888P'
888 888 888 888 888888K 888 88888888 888
888 d88P 888 Y88b. 888 '88b 888 Y8b. 888
88888P' 888 'Y8888P 888 888 888 'Y8888 888
888
888
888
We can compare these outputs directly by saving each markdown file in
tests/testthat/fixtures/
and reading them into a test with
test_path()
:
#> ../tests/testthat/fixtures
#> ├── make-pickler_logo_snap.R
#> ├── pickler_logo2_snap.md
#> └── pickler_logo_snap.md
waldo::compare(
x = readLines(con = test_path("fixtures", "pickler_logo_snap.md")),
y = readLines(con = test_path("fixtures", "pickler_logo2_snap.md"))
)
This shows the changes but includes the escape characters from
readLines()
:
old[3:12] vs new[3:12]
" Y8P 888 888"
" 888 888"
" 88888b. 888 .d8888b 888 888 888 .d88b. 888d888"
- " 888 \"88b 888 d88P\" 888 .88P 888 d8P Y8b 888P\""
+ " 888 '88b 888 d88P' 888 .88P 888 d8P Y8b 888P'"
" 888 888 888 888 888888K 888 88888888 888"
- " 888 d88P 888 Y88b. 888 \"88b 888 Y8b. 888"
+ " 888 d88P 888 Y88b. 888 '88b 888 Y8b. 888"
- " 88888P\" 888 \"Y8888P 888 888 888 \"Y8888 888"
+ " 88888P' 888 'Y8888P 888 888 888 'Y8888 888"
" 888"
" 888" " 888"
However, if the contents of desc
are placed in the
description
argument of describe()
, we see
testthat
detects the changes and asks us to review the
snapshots.2
describe(
"Feature: Text-based Logo Generation
As a user who calls the text_logo() function
I want to generate a text-based logo
So that I can quickly insert the pickler logo", code = {
test_that("pickler_logo()", code = {
expect_snapshot(pickler_logo())
})
})
devtools::test_active_file()
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]
── Failure (test-pickler_logo.R:8:9): pickler_logo ─────────────────────────────
Snapshot of code has changed:
old[6:15] vs new[6:15]
Y8P 888 888
888 888
88888b. 888 .d8888b 888 888 888 .d88b. 888d888
- 888 "88b 888 d88P" 888 .88P 888 d8P Y8b 888P"
+ 888 '88b 888 d88P' 888 .88P 888 d8P Y8b 888P'
888 888 888 888 888888K 888 88888888 888
- 888 d88P 888 Y88b. 888 "88b 888 Y8b. 888
+ 888 d88P 888 Y88b. 888 '88b 888 Y8b. 888
- 88888P" 888 "Y8888P 888 888 888 "Y8888 888
+ 88888P' 888 'Y8888P 888 888 888 'Y8888 888
888
888
888
* Run testthat::snapshot_accept('pickler_logo') to accept the change.
* Run testthat::snapshot_review('pickler_logo') to interactively review the change. [ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]
This is an example of how describe()
gives us the
ability to include context for tests (while keeping the scope confined
to the tests within each describe()
call).
BDD Recap
Multiline descriptions in the desc
argument of
test_that()
can cause issues when used wtth
expect_snapshot()
, but wrapping multiple lines in a call
todescribe()
prevents the test from overwriting the
baseline snapshot file.
The functionality provided by describe()
is an excellent
improvement to testthat
tests, because we’re no longer
confined to comments and the desc
argument of
test_that()
to document what exactly the test is
testing.