Split a dataset into train and test

split_df(dt, y = NULL, ratio = c(0.7, 0.3), seed = 618, no_dfs = 2,
  name_dfs = c("train", "test"))

Arguments

dt

A data frame.

y

Name of y variable, Defaults to NULL. The input data will split based on the predictor y, if it is provide.

ratio

A numeric value, Defaults to 0.7. It indicates the ratio of total rows contained in one split, must less than 1.

seed

A random seed, Defaults to 618.

no_dfs

Number of returned data frames. Defaults to 2.

name_dfs

Name of returned data frames. If its length is not equal with no_dfs, then the names will seted as 'dX'. Defaults to train and test.

Value

A list of data frames

Examples

# load German credit data data(germancredit) # Example I dt_list = split_df(germancredit, y="creditability") train = dt_list[[1]] test = dt_list[[2]] # dimensions of train and test datasets lapply(dt_list, dim)
#> $train #> [1] 681 21 #> #> $test #> [1] 319 21 #>
# Example II dt_list2 = split_df(germancredit, y="creditability", ratio = c(0.5, 0.2)) lapply(dt_list2, dim)
#> $train #> [1] 491 21 #> #> $test #> [1] 211 21 #>