Pivot a string into a long data frame format — pivot_string

pivot_string_long() splits a string (or vector of strings) into its constituent parts based on a specified regex pattern and returns a 'tidy' (long) data.frame.

Usage

pivot_string_long(string, sep = "[^[:alnum:]]+")

Arguments

string: A character vector; the string or strings to split.
sep: A regular expression used as the separator to split the string(s) into items. The default is "[^:alnum:]+", which splits based on one or more non-alphanumeric characters.

Value

A data frame with two columns: unique_items, containing the unique items extracted from the string(s), and string, containing the original string(s). Each row represents a unique item from the string(s), with the original string(s) placed at the first row of each unique_item.

Note

The function handles vectors of strings by applying the splitting and data frame creation process to each element of the vector and then row-binding the individual data frames into a single data frame. The returned string column contains the original string and length(unlist(strsplit(x))) - 1 missing values

Details

If a single string is provided, the function will split that string into items and return a data.frame with each item and the original string. If a vector of strings is provided, the function applies the splitting process to each string in the vector and combines the results into a single data.frame.

Examples

pivot_string_long("one-two-three")
#>   unique_items        string
#> 1          one one-two-three
#> 2          two          <NA>
#> 3        three          <NA>
# include white space
pivot_string_long(
  c("apple, orange, banana", "cat-dog"),
   sep = ",?\\s*-?\\s*")
#>    unique_items                string
#> 1             a apple, orange, banana
#> 2             p                  <NA>
#> 3             p                  <NA>
#> 4             l                  <NA>
#> 5             e                  <NA>
#> 6                                <NA>
#> 7             o                  <NA>
#> 8             r                  <NA>
#> 9             a                  <NA>
#> 10            n                  <NA>
#> 11            g                  <NA>
#> 12            e                  <NA>
#> 13                               <NA>
#> 14            b                  <NA>
#> 15            a                  <NA>
#> 16            n                  <NA>
#> 17            a                  <NA>
#> 18            n                  <NA>
#> 19            a                  <NA>
#> 20            c               cat-dog
#> 21            a                  <NA>
#> 22            t                  <NA>
#> 23                               <NA>
#> 24            d                  <NA>
#> 25            o                  <NA>
#> 26            g                  <NA>
# longer strings
pivot_string_long("A large size in stockings is hard to sell.")
#>   unique_items                                     string
#> 1            A A large size in stockings is hard to sell.
#> 2        large                                       <NA>
#> 3         size                                       <NA>
#> 4           in                                       <NA>
#> 5    stockings                                       <NA>
#> 6           is                                       <NA>
#> 7         hard                                       <NA>
#> 8           to                                       <NA>
#> 9         sell                                       <NA>
# larger strings
pivot_string_long(c("A large size in stockings is hard to sell.",
                    "The first part of the plan needs changing." ))
#>    unique_items                                     string
#> 1             A A large size in stockings is hard to sell.
#> 2         large                                       <NA>
#> 3          size                                       <NA>
#> 4            in                                       <NA>
#> 5     stockings                                       <NA>
#> 6            is                                       <NA>
#> 7          hard                                       <NA>
#> 8            to                                       <NA>
#> 9          sell                                       <NA>
#> 10          The The first part of the plan needs changing.
#> 11        first                                       <NA>
#> 12         part                                       <NA>
#> 13           of                                       <NA>
#> 14          the                                       <NA>
#> 15         plan                                       <NA>
#> 16        needs                                       <NA>
#> 17     changing                                       <NA>