Feature Selection with Design Points
Source:R/FSelectorBatchDesignPoints.R
mlr_fselectors_design_points.Rd
Feature selection using user-defined feature sets.
Details
The feature sets are evaluated in order as given.
The feature selection terminates itself when all feature sets are evaluated. It is not necessary to set a termination criterion.
Parameters
batch_size
integer(1)
Maximum number of configurations to try in a batch.design
data.table::data.table
Design points to try in search, one per row.
Super classes
mlr3fselect::FSelector
-> mlr3fselect::FSelectorBatch
-> mlr3fselect::FSelectorBatchFromOptimizerBatch
-> FSelectorBatchDesignPoints
Examples
# Feature Selection
# \donttest{
# retrieve task and load learner
task = tsk("pima")
learner = lrn("classif.rpart")
# create design
design = mlr3misc::rowwise_table(
~age, ~glucose, ~insulin, ~mass, ~pedigree, ~pregnant, ~pressure, ~triceps,
TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE,
TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE,
TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE,
TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE
)
# run feature selection on the Pima Indians diabetes data set
instance = fselect(
fselector = fs("design_points", design = design),
task = task,
learner = learner,
resampling = rsmp("holdout"),
measure = msr("classif.ce")
)
# best performing feature set
instance$result
#> age glucose insulin mass pedigree pregnant pressure triceps
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
#> features n_features classif.ce
#> <list> <int> <num>
#> 1: age,glucose,mass,pregnant 4 0.2539062
# all evaluated feature sets
as.data.table(instance$archive)
#> age glucose insulin mass pedigree pregnant pressure triceps classif.ce
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <num>
#> 1: TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE 0.2968750
#> 2: TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE 0.2539062
#> 3: TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE 0.2773438
#> 4: TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE 0.2929688
#> runtime_learners timestamp batch_nr warnings errors
#> <num> <POSc> <int> <int> <int>
#> 1: 0.006 2024-11-07 21:50:17 1 0 0
#> 2: 0.007 2024-11-07 21:50:17 2 0 0
#> 3: 0.006 2024-11-07 21:50:17 3 0 0
#> 4: 0.006 2024-11-07 21:50:18 4 0 0
#> features n_features resample_result
#> <list> <list> <list>
#> 1: age,insulin,mass,pregnant,triceps 5 <ResampleResult>
#> 2: age,glucose,mass,pregnant 4 <ResampleResult>
#> 3: age,insulin,mass,pregnant 4 <ResampleResult>
#> 4: age,insulin,mass,pregnant,pressure,triceps 6 <ResampleResult>
# subset the task and fit the final model
task$select(instance$result_feature_set)
learner$train(task)
# }