Motivation
The goal of the dfdiffs
is to answer the following
questions:
- What rows are here now that weren’t here before?
- What rows were here before that aren’t here now?
3. What values have been changed?
This vignette takes us through two functions:
create_changed_data()
: which uses the
arsenal::comparedf()
function
create_modified_data()
: which uses the
diffdf::diffdf()
function
Both functions answers the “What values have been
changed?”
Changed data
We have two datasets to test what’s been modified:
Initial Data
InitialData
subject_id
|
record
|
text_value_a
|
text_value_b
|
created_date
|
updated_date
|
entered_date
|
A
|
1
|
Issue unresolved
|
Fatigue
|
2021-07-29
|
2021-09-29
|
2021-09-29
|
A
|
2
|
Issue unresolved
|
Fatigue
|
2021-07-29
|
2021-10-03
|
2021-10-29
|
B
|
3
|
Issue resolved
|
Fever
|
2021-07-16
|
2021-09-02
|
2021-08-18
|
C
|
4
|
Issue resolved
|
Joint pain
|
2021-08-24
|
2021-10-03
|
2021-10-03
|
C
|
5
|
Issue resolved
|
Joint pain
|
2021-08-24
|
2021-09-20
|
2021-10-20
|
Changed Data
ChangedData
subject_id
|
record
|
text_value_a
|
text_value_b
|
created_date
|
updated_date
|
entered_date
|
A
|
1
|
Issue resolved
|
Fatigue
|
2021-07-29
|
2021-10-03
|
2021-11-30
|
A
|
2
|
Issue resolved
|
Fatigue
|
2021-07-29
|
2021-11-27
|
2021-11-30
|
B
|
3
|
Issue resolved
|
Fever
|
2021-07-16
|
2021-10-20
|
2021-11-21
|
C
|
4
|
Issue resolved
|
Joint pain, stiffness and swelling
|
2021-08-24
|
2021-10-13
|
2021-11-11
|
C
|
5
|
Issue resolved
|
Joint pain
|
2021-08-24
|
2021-10-14
|
2021-11-16
|
Creating join columns
We will use our create_new_column()
function to create
join_var
from subject_id
and
record
ChangedDataJoin <- dfdiffs::create_new_column(
data = ChangedData,
cols = c("subject_id", "record"),
new_name = "join_var")
ChangedDataJoin
ChangedDataJoin
join_var
|
subject_id
|
record
|
text_value_a
|
text_value_b
|
created_date
|
updated_date
|
entered_date
|
A-1
|
A
|
1
|
Issue resolved
|
Fatigue
|
2021-07-29
|
2021-10-03
|
2021-11-30
|
A-2
|
A
|
2
|
Issue resolved
|
Fatigue
|
2021-07-29
|
2021-11-27
|
2021-11-30
|
B-3
|
B
|
3
|
Issue resolved
|
Fever
|
2021-07-16
|
2021-10-20
|
2021-11-21
|
C-4
|
C
|
4
|
Issue resolved
|
Joint pain, stiffness and swelling
|
2021-08-24
|
2021-10-13
|
2021-11-11
|
C-5
|
C
|
5
|
Issue resolved
|
Joint pain
|
2021-08-24
|
2021-10-14
|
2021-11-16
|
InitialDataJoin <- dfdiffs::create_new_column(
data = InitialData,
cols = c("subject_id", "record"),
new_name = "join_var")
InitialDataJoin
InitialDataJoin
join_var
|
subject_id
|
record
|
text_value_a
|
text_value_b
|
created_date
|
updated_date
|
entered_date
|
A-1
|
A
|
1
|
Issue unresolved
|
Fatigue
|
2021-07-29
|
2021-09-29
|
2021-09-29
|
A-2
|
A
|
2
|
Issue unresolved
|
Fatigue
|
2021-07-29
|
2021-10-03
|
2021-10-29
|
B-3
|
B
|
3
|
Issue resolved
|
Fever
|
2021-07-16
|
2021-09-02
|
2021-08-18
|
C-4
|
C
|
4
|
Issue resolved
|
Joint pain
|
2021-08-24
|
2021-10-03
|
2021-10-03
|
C-5
|
C
|
5
|
Issue resolved
|
Joint pain
|
2021-08-24
|
2021-09-20
|
2021-10-20
|
create_changed_data()
Each comparison function in the dfdiffs
package assumes
base
and compare
datasets might have
one of the following conditions:
Two datasets
Multiple columns to compare (cols
)
Single by column
Single by
column, new column name
(by_col
)
Single by
column, multiple compare columns
(cols
)
Single
by column, new column name
(by_col
), multiple compare columns
(cols
)
Multiple by
columns
Multiple by
columns, new column name
(by_col
)
Multiple by
columns, multiple compare columns
(cols
)
Multiple by
columns, a new by_col
, and
cols
Single by column conditions
1) Two datasets
compare all columns:
variable
|
no_of_differences
|
text_value_a
|
2
|
text_value_b
|
1
|
updated_date
|
5
|
entered_date
|
5
|
|
variable
|
rownumber
|
base
|
compare
|
text_value_a
|
1
|
Issue unresolved
|
Issue resolved
|
text_value_a
|
2
|
Issue unresolved
|
Issue resolved
|
text_value_b
|
4
|
Joint pain
|
Joint pain, stiffness and swelling
|
updated_date
|
1
|
2021-09-29
|
2021-10-03
|
updated_date
|
2
|
2021-10-03
|
2021-11-27
|
updated_date
|
3
|
2021-09-02
|
2021-10-20
|
updated_date
|
4
|
2021-10-03
|
2021-10-13
|
updated_date
|
5
|
2021-09-20
|
2021-10-14
|
entered_date
|
1
|
2021-09-29
|
2021-11-30
|
entered_date
|
2
|
2021-10-29
|
2021-11-30
|
entered_date
|
3
|
2021-08-18
|
2021-11-21
|
entered_date
|
4
|
2021-10-03
|
2021-11-11
|
entered_date
|
5
|
2021-10-20
|
2021-11-16
|
|
2) Multiple columns to compare (cols
)
create_changed_data(
compare = ChangedData,
base = InitialData,
cols = c("text_value_a", "text_value_b"))
variable
|
no_of_differences
|
text_value_a
|
2
|
text_value_b
|
1
|
|
variable
|
rownumber
|
base
|
compare
|
text_value_a
|
1
|
Issue unresolved
|
Issue resolved
|
text_value_a
|
2
|
Issue unresolved
|
Issue resolved
|
text_value_b
|
4
|
Joint pain
|
Joint pain, stiffness and swelling
|
|
3) Single by
column
No new column name
variable
|
no_of_differences
|
text_value_a
|
2
|
text_value_b
|
1
|
updated_date
|
5
|
entered_date
|
5
|
|
variable
|
join_var
|
base
|
compare
|
text_value_a
|
A-1
|
Issue unresolved
|
Issue resolved
|
text_value_a
|
A-2
|
Issue unresolved
|
Issue resolved
|
text_value_b
|
C-4
|
Joint pain
|
Joint pain, stiffness and swelling
|
updated_date
|
A-1
|
2021-09-29
|
2021-10-03
|
updated_date
|
A-2
|
2021-10-03
|
2021-11-27
|
updated_date
|
B-3
|
2021-09-02
|
2021-10-20
|
updated_date
|
C-4
|
2021-10-03
|
2021-10-13
|
updated_date
|
C-5
|
2021-09-20
|
2021-10-14
|
entered_date
|
A-1
|
2021-09-29
|
2021-11-30
|
entered_date
|
A-2
|
2021-10-29
|
2021-11-30
|
entered_date
|
B-3
|
2021-08-18
|
2021-11-21
|
entered_date
|
C-4
|
2021-10-03
|
2021-11-11
|
entered_date
|
C-5
|
2021-10-20
|
2021-11-16
|
|
4) Single by
column, new column name
(by_col
)
create_changed_data(
compare = ChangedDataJoin,
base = InitialDataJoin,
by = "join_var",
by_col = "join")
variable
|
no_of_differences
|
text_value_a
|
2
|
text_value_b
|
1
|
updated_date
|
5
|
entered_date
|
5
|
|
variable
|
join
|
base
|
compare
|
text_value_a
|
A-1
|
Issue unresolved
|
Issue resolved
|
text_value_a
|
A-2
|
Issue unresolved
|
Issue resolved
|
text_value_b
|
C-4
|
Joint pain
|
Joint pain, stiffness and swelling
|
updated_date
|
A-1
|
2021-09-29
|
2021-10-03
|
updated_date
|
A-2
|
2021-10-03
|
2021-11-27
|
updated_date
|
B-3
|
2021-09-02
|
2021-10-20
|
updated_date
|
C-4
|
2021-10-03
|
2021-10-13
|
updated_date
|
C-5
|
2021-09-20
|
2021-10-14
|
entered_date
|
A-1
|
2021-09-29
|
2021-11-30
|
entered_date
|
A-2
|
2021-10-29
|
2021-11-30
|
entered_date
|
B-3
|
2021-08-18
|
2021-11-21
|
entered_date
|
C-4
|
2021-10-03
|
2021-11-11
|
entered_date
|
C-5
|
2021-10-20
|
2021-11-16
|
|
5) Single by
column, multiple compare columns
(cols
)
create_changed_data(
compare = ChangedDataJoin,
base = InitialDataJoin,
by = "join_var",
cols = c("text_value_a", "text_value_b"))
variable
|
no_of_differences
|
text_value_a
|
2
|
text_value_b
|
1
|
|
variable
|
join_var
|
base
|
compare
|
text_value_a
|
A-1
|
Issue unresolved
|
Issue resolved
|
text_value_a
|
A-2
|
Issue unresolved
|
Issue resolved
|
text_value_b
|
C-4
|
Joint pain
|
Joint pain, stiffness and swelling
|
|
6) Single by
column, new column name
(by_col
), multiple compare columns (cols
)
create_changed_data(
compare = ChangedDataJoin,
base = InitialDataJoin,
by = "join_var",
by_col = "join",
cols = c("text_value_a", "text_value_b"))
variable
|
no_of_differences
|
text_value_a
|
2
|
text_value_b
|
1
|
|
variable
|
join
|
base
|
compare
|
text_value_a
|
A-1
|
Issue unresolved
|
Issue resolved
|
text_value_a
|
A-2
|
Issue unresolved
|
Issue resolved
|
text_value_b
|
C-4
|
Joint pain
|
Joint pain, stiffness and swelling
|
|
Multiple by column conditions
7) Multiple by
columns
variable
|
no_of_differences
|
text_value_a
|
2
|
text_value_b
|
1
|
updated_date
|
5
|
entered_date
|
5
|
|
variable
|
join
|
base
|
compare
|
text_value_a
|
A-1
|
Issue unresolved
|
Issue resolved
|
text_value_a
|
A-2
|
Issue unresolved
|
Issue resolved
|
text_value_b
|
C-4
|
Joint pain
|
Joint pain, stiffness and swelling
|
updated_date
|
A-1
|
2021-09-29
|
2021-10-03
|
updated_date
|
A-2
|
2021-10-03
|
2021-11-27
|
updated_date
|
B-3
|
2021-09-02
|
2021-10-20
|
updated_date
|
C-4
|
2021-10-03
|
2021-10-13
|
updated_date
|
C-5
|
2021-09-20
|
2021-10-14
|
entered_date
|
A-1
|
2021-09-29
|
2021-11-30
|
entered_date
|
A-2
|
2021-10-29
|
2021-11-30
|
entered_date
|
B-3
|
2021-08-18
|
2021-11-21
|
entered_date
|
C-4
|
2021-10-03
|
2021-11-11
|
entered_date
|
C-5
|
2021-10-20
|
2021-11-16
|
|
8) Multiple by
columns, new column name
(by_col
)
create_changed_data(
compare = ChangedData,
base = InitialData,
by = c("subject_id", "record"),
by_col = "new_join_var")
variable
|
no_of_differences
|
text_value_a
|
2
|
text_value_b
|
1
|
updated_date
|
5
|
entered_date
|
5
|
|
variable
|
new_join_var
|
base
|
compare
|
text_value_a
|
A-1
|
Issue unresolved
|
Issue resolved
|
text_value_a
|
A-2
|
Issue unresolved
|
Issue resolved
|
text_value_b
|
C-4
|
Joint pain
|
Joint pain, stiffness and swelling
|
updated_date
|
A-1
|
2021-09-29
|
2021-10-03
|
updated_date
|
A-2
|
2021-10-03
|
2021-11-27
|
updated_date
|
B-3
|
2021-09-02
|
2021-10-20
|
updated_date
|
C-4
|
2021-10-03
|
2021-10-13
|
updated_date
|
C-5
|
2021-09-20
|
2021-10-14
|
entered_date
|
A-1
|
2021-09-29
|
2021-11-30
|
entered_date
|
A-2
|
2021-10-29
|
2021-11-30
|
entered_date
|
B-3
|
2021-08-18
|
2021-11-21
|
entered_date
|
C-4
|
2021-10-03
|
2021-11-11
|
entered_date
|
C-5
|
2021-10-20
|
2021-11-16
|
|
9) Multiple by
columns, multiple compare columns
(cols
)
create_changed_data(
compare = ChangedData,
base = InitialData,
by = c("subject_id", "record"),
cols = c("text_value_a", "text_value_b"))
variable
|
no_of_differences
|
text_value_a
|
2
|
text_value_b
|
1
|
|
variable
|
join
|
base
|
compare
|
text_value_a
|
A-1
|
Issue unresolved
|
Issue resolved
|
text_value_a
|
A-2
|
Issue unresolved
|
Issue resolved
|
text_value_b
|
C-4
|
Joint pain
|
Joint pain, stiffness and swelling
|
|
10) Multiple by
columns, a new by_col
, and
cols
create_changed_data(
compare = ChangedData,
base = InitialData,
by = c("subject_id", "record"),
by_col = "join",
cols = c("text_value_a", "text_value_b"))
variable
|
no_of_differences
|
text_value_a
|
2
|
text_value_b
|
1
|
|
variable
|
join
|
base
|
compare
|
text_value_a
|
A-1
|
Issue unresolved
|
Issue resolved
|
text_value_a
|
A-2
|
Issue unresolved
|
Issue resolved
|
text_value_b
|
C-4
|
Joint pain
|
Joint pain, stiffness and swelling
|
|
create_modified_data()
Below we have the create_modified_data()
function, which
takes two data frames (compare
and base
), a
by
column, and the cols
to compare.
We’ll test this function below on all ten possible conditions for
base
and compare
.
Single joining (by
) column conditions
1) Two datasets
- No
by
columns (only two datasets)
Variable name
|
Modified Values
|
Missing Values
|
subject_id
|
0
|
0
|
record
|
0
|
0
|
text_value_a
|
2
|
0
|
text_value_b
|
1
|
0
|
created_date
|
0
|
0
|
updated_date
|
5
|
0
|
entered_date
|
5
|
0
|
|
Variable name
|
Current Value
|
Previous Value
|
text_value_a
|
Issue resolved
|
Issue unresolved
|
text_value_a
|
Issue resolved
|
Issue unresolved
|
text_value_b
|
Joint pain, stiffness and swelling
|
Joint pain
|
updated_date
|
2021-10-03
|
2021-09-29
|
updated_date
|
2021-11-27
|
2021-10-03
|
updated_date
|
2021-10-20
|
2021-09-02
|
updated_date
|
2021-10-13
|
2021-10-03
|
updated_date
|
2021-10-14
|
2021-09-20
|
entered_date
|
2021-11-30
|
2021-09-29
|
entered_date
|
2021-11-30
|
2021-10-29
|
entered_date
|
2021-11-21
|
2021-08-18
|
entered_date
|
2021-11-11
|
2021-10-03
|
entered_date
|
2021-11-16
|
2021-10-20
|
|
2) Multiple columns to compare (cols
)
- No
by
columns (only two datasets) and multiple compare
(cols
)
compare_list <- create_modified_data(
compare = ChangedData,
base = InitialData,
cols = c("text_value_a", "text_value_b"))
Variable name
|
Modified Values
|
Missing Values
|
text_value_a
|
2
|
0
|
text_value_b
|
1
|
0
|
|
Variable name
|
Current Value
|
Previous Value
|
text_value_a
|
Issue resolved
|
Issue unresolved
|
text_value_a
|
Issue resolved
|
Issue unresolved
|
text_value_b
|
Joint pain, stiffness and swelling
|
Joint pain
|
|
3) Single by
column
- We can provide a single
by
column (using our
InitialDataJoin
and ChangedDataJoin
) datasets
we created above.
compare_list <- create_modified_data(
compare = ChangedDataJoin,
base = InitialDataJoin,
by = "join_var")
Variable name
|
Modified Values
|
Missing Values
|
subject_id
|
0
|
0
|
record
|
0
|
0
|
text_value_a
|
2
|
0
|
text_value_b
|
1
|
0
|
created_date
|
0
|
0
|
updated_date
|
5
|
0
|
entered_date
|
5
|
0
|
|
Variable name
|
join_var
|
Current Value
|
Previous Value
|
text_value_a
|
A-1
|
Issue resolved
|
Issue unresolved
|
text_value_a
|
A-2
|
Issue resolved
|
Issue unresolved
|
text_value_b
|
C-4
|
Joint pain, stiffness and swelling
|
Joint pain
|
updated_date
|
A-1
|
2021-10-03
|
2021-09-29
|
updated_date
|
A-2
|
2021-11-27
|
2021-10-03
|
updated_date
|
B-3
|
2021-10-20
|
2021-09-02
|
updated_date
|
C-4
|
2021-10-13
|
2021-10-03
|
updated_date
|
C-5
|
2021-10-14
|
2021-09-20
|
entered_date
|
A-1
|
2021-11-30
|
2021-09-29
|
entered_date
|
A-2
|
2021-11-30
|
2021-10-29
|
entered_date
|
B-3
|
2021-11-21
|
2021-08-18
|
entered_date
|
C-4
|
2021-11-11
|
2021-10-03
|
entered_date
|
C-5
|
2021-11-16
|
2021-10-20
|
|
4) Single by
column, new column name
(by_col
)
- We can also provide a single
by
column (for unique
identifiers) and a new name for the by_col
compare_list <- create_modified_data(
compare = ChangedDataJoin,
base = InitialDataJoin,
by = "join_var",
by_col = "new_join_var")
Variable name
|
Modified Values
|
Missing Values
|
subject_id
|
0
|
0
|
record
|
0
|
0
|
text_value_a
|
2
|
0
|
text_value_b
|
1
|
0
|
created_date
|
0
|
0
|
updated_date
|
5
|
0
|
entered_date
|
5
|
0
|
|
Variable name
|
new_join_var
|
Current Value
|
Previous Value
|
text_value_a
|
A-1
|
Issue resolved
|
Issue unresolved
|
text_value_a
|
A-2
|
Issue resolved
|
Issue unresolved
|
text_value_b
|
C-4
|
Joint pain, stiffness and swelling
|
Joint pain
|
updated_date
|
A-1
|
2021-10-03
|
2021-09-29
|
updated_date
|
A-2
|
2021-11-27
|
2021-10-03
|
updated_date
|
B-3
|
2021-10-20
|
2021-09-02
|
updated_date
|
C-4
|
2021-10-13
|
2021-10-03
|
updated_date
|
C-5
|
2021-10-14
|
2021-09-20
|
entered_date
|
A-1
|
2021-11-30
|
2021-09-29
|
entered_date
|
A-2
|
2021-11-30
|
2021-10-29
|
entered_date
|
B-3
|
2021-11-21
|
2021-08-18
|
entered_date
|
C-4
|
2021-11-11
|
2021-10-03
|
entered_date
|
C-5
|
2021-11-16
|
2021-10-20
|
|
5) Single by
column, multiple compare columns
cols
- Single
by
column and multiple compare columns
(cols
)
compare_list <- create_modified_data(
compare = ChangedDataJoin,
base = InitialDataJoin,
by = "join_var",
cols = c("text_value_a", "text_value_b"))
Variable name
|
Modified Values
|
Missing Values
|
text_value_a
|
2
|
0
|
text_value_b
|
1
|
0
|
|
Variable name
|
join_var
|
Current Value
|
Previous Value
|
text_value_a
|
A-1
|
Issue resolved
|
Issue unresolved
|
text_value_a
|
A-2
|
Issue resolved
|
Issue unresolved
|
text_value_b
|
C-4
|
Joint pain, stiffness and swelling
|
Joint pain
|
|
6) Single by
column, new column name
(by_col
), multiple compare columns (cols
)
- Single
by
column, a new name for the by column
(by_col
), and multiple compare columns
(cols
)
compare_list <- create_modified_data(
compare = ChangedDataJoin,
base = InitialDataJoin,
by = "join_var",
by_col = "new_join_var",
cols = c("text_value_a", "text_value_b"))
Variable name
|
Modified Values
|
Missing Values
|
text_value_a
|
2
|
0
|
text_value_b
|
1
|
0
|
|
Variable name
|
new_join_var
|
Current Value
|
Previous Value
|
text_value_a
|
A-1
|
Issue resolved
|
Issue unresolved
|
text_value_a
|
A-2
|
Issue resolved
|
Issue unresolved
|
text_value_b
|
C-4
|
Joint pain, stiffness and swelling
|
Joint pain
|
|
Multiple by
column conditions
The next conditions cover when two existing columns are used to
create a unique identifier between the two datasets.
7) Multiple by
columns
- Multiple
by
columns (assuming the columns create a
unique identifier).
compare_list <- create_modified_data(
compare = ChangedData,
base = InitialData,
by = c("subject_id", "record"))
Variable name
|
Modified Values
|
Missing Values
|
subject_id
|
0
|
0
|
record
|
0
|
0
|
text_value_a
|
2
|
0
|
text_value_b
|
1
|
0
|
created_date
|
0
|
0
|
updated_date
|
5
|
0
|
entered_date
|
5
|
0
|
|
Variable name
|
join
|
Current Value
|
Previous Value
|
text_value_a
|
A-1
|
Issue resolved
|
Issue unresolved
|
text_value_a
|
A-2
|
Issue resolved
|
Issue unresolved
|
text_value_b
|
C-4
|
Joint pain, stiffness and swelling
|
Joint pain
|
updated_date
|
A-1
|
2021-10-03
|
2021-09-29
|
updated_date
|
A-2
|
2021-11-27
|
2021-10-03
|
updated_date
|
B-3
|
2021-10-20
|
2021-09-02
|
updated_date
|
C-4
|
2021-10-13
|
2021-10-03
|
updated_date
|
C-5
|
2021-10-14
|
2021-09-20
|
entered_date
|
A-1
|
2021-11-30
|
2021-09-29
|
entered_date
|
A-2
|
2021-11-30
|
2021-10-29
|
entered_date
|
B-3
|
2021-11-21
|
2021-08-18
|
entered_date
|
C-4
|
2021-11-11
|
2021-10-03
|
entered_date
|
C-5
|
2021-11-16
|
2021-10-20
|
|
This creates a join
column from the by
columns.
8) Multiple by
columns, new column name
(by_col
)
We can provide multiple by
columns, a new
by_col
, and no cols
compare_list <- create_modified_data(
compare = ChangedData,
base = InitialData,
by = c("subject_id", "record"),
by_col = "new_join_var")
Variable name
|
Modified Values
|
Missing Values
|
subject_id
|
0
|
0
|
record
|
0
|
0
|
text_value_a
|
2
|
0
|
text_value_b
|
1
|
0
|
created_date
|
0
|
0
|
updated_date
|
5
|
0
|
entered_date
|
5
|
0
|
|
Variable name
|
new_join_var
|
Current Value
|
Previous Value
|
text_value_a
|
A-1
|
Issue resolved
|
Issue unresolved
|
text_value_a
|
A-2
|
Issue resolved
|
Issue unresolved
|
text_value_b
|
C-4
|
Joint pain, stiffness and swelling
|
Joint pain
|
updated_date
|
A-1
|
2021-10-03
|
2021-09-29
|
updated_date
|
A-2
|
2021-11-27
|
2021-10-03
|
updated_date
|
B-3
|
2021-10-20
|
2021-09-02
|
updated_date
|
C-4
|
2021-10-13
|
2021-10-03
|
updated_date
|
C-5
|
2021-10-14
|
2021-09-20
|
entered_date
|
A-1
|
2021-11-30
|
2021-09-29
|
entered_date
|
A-2
|
2021-11-30
|
2021-10-29
|
entered_date
|
B-3
|
2021-11-21
|
2021-08-18
|
entered_date
|
C-4
|
2021-11-11
|
2021-10-03
|
entered_date
|
C-5
|
2021-11-16
|
2021-10-20
|
|
9) Multiple by
columns, multiple compare columns
(cols
)
- Multiple
by
columns and multiple compare columns
(cols
), and no new
by_col
.
compare_list <- create_modified_data(
compare = ChangedData,
base = InitialData,
by = c("subject_id", "record"),
cols = c("text_value_a", "text_value_b"))
Variable name
|
Modified Values
|
Missing Values
|
text_value_a
|
2
|
0
|
text_value_b
|
1
|
0
|
|
Variable name
|
join
|
Current Value
|
Previous Value
|
text_value_a
|
A-1
|
Issue resolved
|
Issue unresolved
|
text_value_a
|
A-2
|
Issue resolved
|
Issue unresolved
|
text_value_b
|
C-4
|
Joint pain, stiffness and swelling
|
Joint pain
|
|
This creates a join
column from the by
columns.
10) Multiple by
columns, a new by_col
, and
cols
- We can provide multiple
by
columns, a new
by_col
, and multiple cols
.
compare_list <- create_modified_data(
compare = ChangedData,
base = InitialData,
by = c("subject_id", "record"),
by_col = "new_join_var",
cols = c("text_value_a", "text_value_b"))
Variable name
|
Modified Values
|
Missing Values
|
text_value_a
|
2
|
0
|
text_value_b
|
1
|
0
|
|
Variable name
|
new_join_var
|
Current Value
|
Previous Value
|
text_value_a
|
A-1
|
Issue resolved
|
Issue unresolved
|
text_value_a
|
A-2
|
Issue resolved
|
Issue unresolved
|
text_value_b
|
C-4
|
Joint pain, stiffness and swelling
|
Joint pain
|
|