Split a data frame into multiple datasets according to the specified ratios.
A data frame.
Name of y variable, Defaults to NULL. The input data will split based on the predictor y, if it is provide.
A numeric vector indicating the ratio of total rows contained in each split, defaults to c(0.7, 0.3).
Name of returned data frames. Its length should equals to the ratios'. Defaults to train and test.
The out-of-time validation dataset parameters. The parameters of time_cols and either time_start or ratio need to be supplied.
A random seed, Defaults to 618.
Additional parameters.
A list of data frames
# load German credit data
data(germancredit)
# Example I
dt_list = split_df(germancredit, y="creditability")
# dimensions of each split data sets
lapply(dt_list, dim)
#> $train
#> [1] 681 21
#>
#> $test
#> [1] 319 21
#>
# Example II
dt_list2 = split_df(germancredit, y="creditability",
ratios = c(0.5, 0.3, 0.2),
name_dfs = c('train', 'test', 'valid'))
lapply(dt_list2, dim)
#> $train
#> [1] 491 21
#>
#> $test
#> [1] 298 21
#>
#> $valid
#> [1] 211 21
#>