pivot_string_long()
splits a string (or vector of strings) into its
constituent parts based on a specified regex pattern and returns a 'tidy'
(long) data.frame
.
Arguments
- string
A character vector; the string or strings to split.
- sep
A regular expression used as the separator to split the string(s) into items. The default is "[^:alnum:]+", which splits based on one or more non-alphanumeric characters.
Value
A data frame with two columns: unique_items
, containing the unique
items extracted from the string(s), and string
, containing the original
string(s). Each row represents a unique item from the string(s), with the
original string(s) placed at the first row of each unique_item
.
Note
The function handles vectors of strings by applying the splitting and data
frame creation process to each element of the vector and then row-binding
the individual data frames into a single data frame. The returned string
column contains the original string and length(unlist(strsplit(x))) - 1
missing values
Details
If a single string is provided, the function will split that string into
items and return a data.frame
with each item and the original string. If a
vector of strings is provided, the function applies the splitting process
to each string in the vector and combines the results into a single
data.frame
.
Examples
pivot_string_long("one-two-three")
#> unique_items string
#> 1 one one-two-three
#> 2 two <NA>
#> 3 three <NA>
# include white space
pivot_string_long(
c("apple, orange, banana", "cat-dog"),
sep = ",?\\s*-?\\s*")
#> unique_items string
#> 1 a apple, orange, banana
#> 2 p <NA>
#> 3 p <NA>
#> 4 l <NA>
#> 5 e <NA>
#> 6 <NA>
#> 7 o <NA>
#> 8 r <NA>
#> 9 a <NA>
#> 10 n <NA>
#> 11 g <NA>
#> 12 e <NA>
#> 13 <NA>
#> 14 b <NA>
#> 15 a <NA>
#> 16 n <NA>
#> 17 a <NA>
#> 18 n <NA>
#> 19 a <NA>
#> 20 c cat-dog
#> 21 a <NA>
#> 22 t <NA>
#> 23 <NA>
#> 24 d <NA>
#> 25 o <NA>
#> 26 g <NA>
# longer strings
pivot_string_long("A large size in stockings is hard to sell.")
#> unique_items string
#> 1 A A large size in stockings is hard to sell.
#> 2 large <NA>
#> 3 size <NA>
#> 4 in <NA>
#> 5 stockings <NA>
#> 6 is <NA>
#> 7 hard <NA>
#> 8 to <NA>
#> 9 sell <NA>
# larger strings
pivot_string_long(c("A large size in stockings is hard to sell.",
"The first part of the plan needs changing." ))
#> unique_items string
#> 1 A A large size in stockings is hard to sell.
#> 2 large <NA>
#> 3 size <NA>
#> 4 in <NA>
#> 5 stockings <NA>
#> 6 is <NA>
#> 7 hard <NA>
#> 8 to <NA>
#> 9 sell <NA>
#> 10 The The first part of the plan needs changing.
#> 11 first <NA>
#> 12 part <NA>
#> 13 of <NA>
#> 14 the <NA>
#> 15 plan <NA>
#> 16 needs <NA>
#> 17 changing <NA>