Split a column into multiple columns based on a pattern

split_cols() splits a specific column into multiple columns based on a provided pattern (new columns can have a specified prefix).

Usage

split_cols(data, col, pattern = "[^[:alnum:]]+", col_prefix = "col")

Arguments

data: A data.frame or tibble in with a column to split.
col: name of the character column to split
pattern: regular expression used to define the split points in the column's values. The default is "[^:alnum:]+", which matches one or more non-alphanumeric characters.
col_prefix: prefix to use for the columns created from the split. The default prefix is col_.

Value

A data frame with the original columns and the new columns created from splitting the specified column.

Details

The function verifies the input types and the presence of the column to be split in the data frame. It then splits the specified column into a list of vectors, finds the maximum number of elements from these vectors, and pads shorter vectors with NA to align all vectors to the same length. These vectors are then combined into new columns and appended to the original data frame.

Checks

The function will stop and throw an error if any of the input conditions are not met, ensuring that the input data.frame, column name, pattern, and column prefix are all correctly specified and formatted before proceeding.

Examples

d <- data.frame(value = c(29L, 91L, 39L, 28L, 12L),
                name = c("John", "John, Jacob",
                         "John, Jacob, Jingleheimer",
                         "Jingleheimer, Schmidt",
                         "JJJ, Schmidt"))
# no prefix
split_cols(data = d, col = "name")
#>   value                      name        col_1   col_2        col_3
#> 1    29                      John         John    <NA>         <NA>
#> 2    91               John, Jacob         John   Jacob         <NA>
#> 3    39 John, Jacob, Jingleheimer         John   Jacob Jingleheimer
#> 4    28     Jingleheimer, Schmidt Jingleheimer Schmidt         <NA>
#> 5    12              JJJ, Schmidt          JJJ Schmidt         <NA>
# with prefix
split_cols(data = d, col = "name", col_prefix = "names")
#>   value                      name      names_1 names_2      names_3
#> 1    29                      John         John    <NA>         <NA>
#> 2    91               John, Jacob         John   Jacob         <NA>
#> 3    39 John, Jacob, Jingleheimer         John   Jacob Jingleheimer
#> 4    28     Jingleheimer, Schmidt Jingleheimer Schmidt         <NA>
#> 5    12              JJJ, Schmidt          JJJ Schmidt         <NA>