What Do You Mean Test Coverage?!

Efficient Testing for Shiny Apps

Martin Frigaard (Atorus)

Introduction


A bit about me…



As a shiny developer
I want to focus on writing tests that matter 
So that I can spend less time testing code...

Agenda


Shiny testing

  • Unit tests

  • Integration tests

  • System tests

  • Test tools

    • Fixtures
    • Helpers

Development

  • Standard app development

  • Behavior-driven development

    • Features
    • Scenarios

Efficient Tests

  • What should I test?

  • How should I test it?

  • Code coverage

Shiny Testing

Testing your shiny app


  • Is much easier if your app is in a package

   …but its not impossible if it’s not

  • Requires additional packages and/or functions beyond testthat
  • Benefits from having a well-designed test suite
  • Understand the relationship between R/ files and test- files


Unit Tests


  • Package(s): testthat


  • Focuses on specific units of code, ensuring each function or component behaves as intended.


Example: Verifying that a function correctly calculates a specific value based on the input.

Integration Tests


  • Package(s): shiny (testServer()) & testtthat


  • Confirms app/module server functions operate as expected


Example: Making sure that modules communicate and display the results from a specific function or calculation

System Tests


Package(s): shiny & shinytest2


Confirms all parts of the app behave correctly and provide a good user experience.


Example: Simulating a user’s experience with the application, selecting inputs, entering data, and ensuring the application responds correctly and displays the expected outputs.

Test Fixtures

Test fixtures are used to create repeatable test conditions


  • Good fixtures provide a consistent, well-defined test environment.


  • Fixtures are removed/destroyed after the test is executed.


  • This ensures any changes made during the test don’t persist or interfere with future tests.

Example: custom plot function


My app has the following utility function for creating a ggplot2 scatter plot:


scatter_plot <- function(df, x_var, y_var, col_var, alpha_var, size_var) {
    ggplot2::ggplot(data = df,
      ggplot2::aes(x = .data[[x_var]],
          y = .data[[y_var]],
          color = .data[[col_var]])) +
      ggplot2::geom_point(alpha = alpha_var, size = size_var)

}


The data masking from rlang (.data[[ ]]) means it can handle string arguments (i.e. input$x and input$y)

Example: test fixture


Test fixtures can be stored in tests/testthat/fixtures/


tests/
  ├── testthat/
     ├── fixtures/                                         
     │   ├── make-tidy_ggp2_movies.R 
     │   └── tidy_ggp2_movies.rds 
     ├── helper.R                                          
     └── test-scatter_plot.R                                     
  └── testthat.R


The make-tidy_ggp2_movies.R creates a ‘tidy’ version of ggplot2movies::movies.

Using test fixtures


Static data fixtures can be accessed with testthat::test_path():

test_that("tidy_ggp2_movies.rds works", code = {
tidy_ggp2_movies <- readRDS(test_path("fixtures", "tidy_ggp2_movies.rds"))
  app_graph <- scatter_plot(tidy_ggp2_movies,
                            x_var = 'rating',
                            y_var = 'budget',
                            col_var = 'mpaa',
                            alpha_var = 3/4,
                            size_var = 2.5)
expect_true(ggplot2::is.ggplot(app_graph))
})
  • If tidy_ggp2_movies.rds is used in a few tests, move make-tidy_ggp2_movies.R into data-raw/ and make tidy_ggp2_movies part of the package
  • ggplot2::is.ggplot() confirms a plot object has been built (doesn’t require a snapshot test)

Test helpers

Test helpers reduce repeated/duplicated test code


Objects that aren’t large enough to justify storing as static test fixtures can be created with helper functions


Helpers can be stored in tests/testthat/helper.R


tests/
  ├── testthat/
     ├── fixtures/
     │   ├── make-tidy_ggp2_movies.R
     │   └── tidy_ggp2_movies.rds
     ├── helper.R
     └── test-scatter_plot.R    
  └── testthat.R

Example: test helper


Assume I want a list of inputs to pass to the scatter_plot() in my test:

ggp2_scatter_inputs <- list(  
        x = "rating",
        y = "length",
        z = "mpaa",
        alpha = 0.75,
        size = 3,
        plot_title = "Enter plot title"
)


I could store these values in a function in tests/testthat/helper.R

var_inputs <- function() {
   list( x = "rating",
        y = "length",
        z = "mpaa",
        alpha = 0.75,
        size = 3,
        plot_title = "Enter plot title"
    )
}

Using test helpers


This removes duplicated code…


test_that("scatter_plot() works", code = {
  
tidy_ggp2_movies <- readRDS(test_path("fixtures", "tidy_ggp2_movies.rds"))
  
app_graph <- scatter_plot(tidy_ggp2_movies,
                          x_var = var_inputs()$x,
                          y_var = var_inputs()$y,
                          col_var = var_inputs()$z,
                          alpha_var = var_inputs()$alpha,
                          size_var = var_inputs()$size)

testthat::expect_true(ggplot2::is.ggplot(app_graph))
})

…but it’s unclear where var_inputs() comes from (or what it contains)

Tips on test helpers

If you have repeated code in your tests, consider the following questions below before creating a helper function:

  1. Does the code help explain what behavior is being tested?

  2. Would a helper make it harder to debug the test when it fails?

Consider a function like make_ggp2_inputs():


list(x = 'rating',
     y = 'length',
     z = 'mpaa',
     alpha = 0.75,
     size = 3,
     plot_title = 'Enter plot title'
     )

Development

Traditional development

Focuses on coding an applications functionalities


  • User specifications often go beyond what they ‘need’ and include solutions
  • Specifications with solutions bind developers to a particular implementation
  • Developers will then focus on the technical implementation and not finding the optimal solution
  • Can lead to delays evaluating the testability of features until late in the project

Behavior-driven development

Create applications that meet desired behaviors


  • Starts with understanding the user needs
  • Acknowledges specifications and requirements will change and evolve
  • Identifies and prioritizes features that deliver value
  • Uses scenarios for guiding how to test and build features

How does BDD work?

Users and developers work together to develop a clear vision of app’s value


Ongoing discussions between users and developers improves understanding of the problem by:


  • Uncovering hidden assumptions
  • Considering any potential risks
  • Building a shared appreciation for meeting user needs and achieving business goals

Features

Features are tangible functionalities that facilitate achieving a business goal


  1. Who wants the feature?
  1. What action does the feature perform?
  1. What is the intended business value?

Gherkin:

Feature: < what is being built to deliver the proposed value >
  As a < user/stakeholder >
  I want to < perform some action >
  So that I can < achieve a business goal >

Describing features

Users and developers write stories to describe a feature’s expected outcome


  1. Uses a first-person voice
  1. States what users need
  1. States why users need it
Feature: CDISC Variable Exploration Dashboard
  As a researcher or analyst 
  I want to explore variables in the Vital Signs Analysis Dataset (ADVS)
  So that I can analyze and derive insights from the vital signs data

Scenarios

Describe application behaviors in a plain, human-readable format


  • Given: establishes preconditions for the scenario.
  • When: specifies an action being tested.
  • Then: defines what outcomes to expect.

Gherkin:

  Scenario: < concrete example >
    Given < initial conditions >
    When  < action to test >
    Then  < expected outcome >

Writing scenarios


Set the stage

Feature: User login

  Scenario: Successful login with correct credentials
    Given the login page is loaded

Action and outcome

    When the user enters a valid username and password
    Then the user should be redirected to the dashboard

More outcomes

    And the user should see the landing page


Alternate story

  Scenario: Unsuccessful login with incorrect password
    Given the login page is loaded
    When the user enters a valid username but an incorrect password
    Then the user should see an error message stating 'Invalid password.'

BDD with Shiny

Efficient testing means writing scenarios that cover critical paths.


  • Ideally everything in your app is tested
  • In reality, decisions have to be made about what to test
  • If developers and users have collaboratively defined key features and expected behaviors, prioritize those scenarios
  • Adopt an ‘inspect and adapt’ posture

testthat BDD support


testthat has describe() and it() functions for features and scenarios:


testthat::describe(
    "Feature: Scatter plot data visualization
       As a film data analyst
       I want to explore movie review data from IMDB.com
       So that I can analyze relationships between movie reivew metrics"
       code = {
      testthat::it(
        "Scenario: Create scatter plot
           Given I have launched the movie review exploration app,
           When I view the scatter plot,
           Then I should see points representing values for a default
                set of continuous and categorical columns.", 
          code = {
      # test code
  })
})

What to test

Efficient testing

  • Unit tests
    • All business logic (models, calculations, etc.) should have unit tests (no getting around this)
  • Integration tests
    • shiny::testServer() tests should focus on ‘handshakes’ between modules and exporting/saving data
  • System/end-to-end tests
    • Prioritize the feature/scenario that most directly observes the intended business goal with shinytest2()

Test coverage


  • Test coverage focuses on the execution paths (i.e., it assumes all paths are equally important)
  • Test coverage can’t check if our app meets user expectations
  • Test coverage is a valuable metric, but shouldn’t be the sole criterion for assessing your app

Remember



Today’s functionality is tomorrow’s regression



Apps need thorough and continuous testing in development so new features don’t negatively affect the existing functionalities

Thanks!