split_cols()
splits a specific column into multiple columns based on a
provided pattern (new columns can have a specified prefix).
Arguments
- data
A
data.frame
ortibble
in with a column to split.- col
name of the character column to split
- pattern
regular expression used to define the split points in the column's values. The default is "[^:alnum:]+", which matches one or more non-alphanumeric characters.
- col_prefix
prefix to use for the columns created from the split. The default prefix is
col_
.
Value
A data frame with the original columns and the new columns created from splitting the specified column.
Details
The function verifies the input types and the presence of the column to be
split in the data frame. It then splits the specified column into a list of
vectors, finds the maximum number of elements from these vectors, and pads
shorter vectors with NA
to align all vectors to the same length. These
vectors are then combined into new columns and appended to the original
data frame.
Checks
The function will stop and throw an error if any of the input conditions are
not met, ensuring that the input data.frame
, column name, pattern, and
column prefix are all correctly specified and formatted before proceeding.
Examples
d <- data.frame(value = c(29L, 91L, 39L, 28L, 12L),
name = c("John", "John, Jacob",
"John, Jacob, Jingleheimer",
"Jingleheimer, Schmidt",
"JJJ, Schmidt"))
# no prefix
split_cols(data = d, col = "name")
#> value name col_1 col_2 col_3
#> 1 29 John John <NA> <NA>
#> 2 91 John, Jacob John Jacob <NA>
#> 3 39 John, Jacob, Jingleheimer John Jacob Jingleheimer
#> 4 28 Jingleheimer, Schmidt Jingleheimer Schmidt <NA>
#> 5 12 JJJ, Schmidt JJJ Schmidt <NA>
# with prefix
split_cols(data = d, col = "name", col_prefix = "names")
#> value name names_1 names_2 names_3
#> 1 29 John John <NA> <NA>
#> 2 91 John, Jacob John Jacob <NA>
#> 3 39 John, Jacob, Jingleheimer John Jacob Jingleheimer
#> 4 28 Jingleheimer, Schmidt Jingleheimer Schmidt <NA>
#> 5 12 JJJ, Schmidt JJJ Schmidt <NA>