One-hot encoding on categorical variables and replace missing values. It is not needed when creating a standard scorecard model, but required in models that without doing woe transformation.
one_hot(dt, var_skip = NULL, var_encode = NULL, nacol_rm = FALSE, ...)A data frame.
Name of categorical variables that will skip for one-hot encoding. Defaults to NULL.
Name of categorical variables to be one-hot encoded, Defaults to NULL. If it is NULL, then all categorical variables except in var_skip are counted.
Logical. One-hot encoding on categorical variable contains missing values, whether to remove the column generated to indicate the presence of NAs. Defaults to FALSE.
Additional parameters.
A data frame
# load germancredit data
data(germancredit)
library(data.table)
dat = rbind(
setDT(germancredit)[, c(sample(20,3),21)],
data.table(creditability=sample(c("good","bad"),10,replace=TRUE)),
fill=TRUE)
# one hot encoding
## keep na columns from categorical variable
dat_onehot1 = one_hot(dat, var_skip = 'creditability', nacol_rm = FALSE) # default
str(dat_onehot1)
#> Classes ‘data.table’ and 'data.frame': 1010 obs. of 10 variables:
#> $ age.in.years : num 67 22 49 45 53 35 53 35 61 28 ...
#> $ creditability : Factor w/ 2 levels "bad","good": 2 1 2 2 1 2 2 2 2 1 ...
#> $ foreign.worker_NA : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ foreign.worker_no : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ foreign.worker_yes : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ personal.status.and.sex_NA : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ personal.status.and.sex_female : divorced/separated/married: int 0 0 0 0 0 0 0 0 0 0 ...
#> $ personal.status.and.sex_male : divorced/separated : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ personal.status.and.sex_male : married/widowed : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ personal.status.and.sex_male : single : int 0 0 0 0 0 0 0 0 0 0 ...
#> - attr(*, ".internal.selfref")=<externalptr>
## remove na columns from categorical variable
dat_onehot2 = one_hot(dat, var_skip = 'creditability', nacol_rm = TRUE)
str(dat_onehot2)
#> Classes ‘data.table’ and 'data.frame': 1010 obs. of 8 variables:
#> $ age.in.years : num 67 22 49 45 53 35 53 35 61 28 ...
#> $ creditability : Factor w/ 2 levels "bad","good": 2 1 2 2 1 2 2 2 2 1 ...
#> $ foreign.worker_no : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ foreign.worker_yes : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ personal.status.and.sex_female : divorced/separated/married: int 0 0 0 0 0 0 0 0 0 0 ...
#> $ personal.status.and.sex_male : divorced/separated : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ personal.status.and.sex_male : married/widowed : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ personal.status.and.sex_male : single : int 0 0 0 0 0 0 0 0 0 0 ...
#> - attr(*, ".internal.selfref")=<externalptr>