Skip to contents

Motivation

The goal of the dfdiffs is to answer the following questions:

  1. What rows are here now that weren’t here before?
  2. What rows were here before that aren’t here now?
  3. What values have been changed?

Test data

These are two Masters tables from the Lahman baseball database.

m15 <- dfdiffs::master15 |> 
  dplyr::slice_sample(n = 3000, replace = FALSE)
max(m15$debut, na.rm = TRUE)
#> [1] "2015-09-27"
m20 <- dfdiffs::master20 |> 
  dplyr::slice_sample(n = 3000, replace = FALSE)
max(m20$debut, na.rm = TRUE)
#> [1] "2019-09-09"

The compare_data() function

compare_data(compare = , base = , by = , by_col = , cols = )
comparisons <- compare_data(
  compare = m20, base = m15, 
  by = "playerID", by_col = "join", 
  cols = c("nameFirst", "nameLast", "nameGiven", "height"))
names(comparisons)
#> [1] "new_data"          "deleted_data"      "changed_num_diffs"
#> [4] "changed_var_diffs"

$new_data

comparisons$new_data
head(comparisons$new_data)
join nameFirst nameLast nameGiven height
burnspe01 Pete Burnside Peter Willits 74
taylodu01 Dummy Taylor Luther Haden 73
mechegi01 Gil Meche Gilbert Allen 75
wagnele01 Leon Wagner Leon Lamar 73
sturtta01 Tanyon Sturtze Tanyon James 77
stewafr01 Frank Stewart Frank 73

$deleted_data

comparisons$deleted_data
head(comparisons$deleted_data)
join nameFirst nameLast nameGiven height
doylede01 Denny Doyle Robert Dennis 69
saundde01 Dennis Saunders Dennis James 75
baylodo01 Don Baylor Don Edward 73
bradyst01 Steve Brady Stephen A. 69
jordasl01 Slats Jordan Clarence Veasey 73
crozier01 Eric Crozier Eric Le Roi 76

$changed_num_diffs

comparisons$changed_num_diffs
comparisons$changed_num_diffs
variable no_of_differences
nameFirst 3
nameGiven 5
height 7

$changed_var_diffs

comparisons$changed_var_diffs
comparisons$changed_var_diffs
variable join base compare
nameFirst couloda01 Daniel Danny
nameFirst dorseje01 Jerry Joseph
nameFirst reynoch01 Charlie Thomas
nameGiven davisju01 James J. James Joseph
nameGiven dorseje01 Michael Jeremiah Joseph Wilbur
nameGiven jonesad01 Adam La Marque Adam LaMarque
nameGiven reynoch01 Charles E. Thomas Hart
nameGiven zimmejo02 Jordan M. Jordan Michael
height craigge01 NA 71
height feldmsc01 79 78
height maddefr01 NA 68
height novaiv01 76 77
height pressry01 75 74
height thompta01 77 76
height uptonju01 74 73