Feature Selection with Sequential Search
Source:R/FSelectorBatchSequential.R
mlr_fselectors_sequential.Rd
Feature selection using Sequential Search Algorithm.
Details
Sequential forward selection (strategy = fsf
) extends the feature set in each iteration with the feature that increases the model's performance the most.
Sequential backward selection (strategy = fsb
) follows the same idea but starts with all features and removes features from the set.
The feature selection terminates itself when min_features
or max_features
is reached.
It is not necessary to set a termination criterion.
Control Parameters
min_features
integer(1)
Minimum number of features. By default, 1.max_features
integer(1)
Maximum number of features. By default, number of features in mlr3::Task.strategy
character(1)
Search methodsfs
(forward search) orsbs
(backward search).
Super classes
mlr3fselect::FSelector
-> mlr3fselect::FSelectorBatch
-> FSelectorBatchSequential
Methods
Method optimization_path()
Returns the optimization path.
Arguments
inst
(FSelectInstanceBatchSingleCrit)
Instance optimized with FSelectorBatchSequential.include_uhash
(
logical(1)
)
Includeuhash
column?
Examples
# Feature Selection
# \donttest{
# retrieve task and load learner
task = tsk("penguins")
learner = lrn("classif.rpart")
# run feature selection on the Palmer Penguins data set
instance = fselect(
fselector = fs("sequential"),
task = task,
learner = learner,
resampling = rsmp("holdout"),
measure = msr("classif.ce"),
term_evals = 10
)
# best performing feature set
instance$result
#> bill_depth bill_length body_mass flipper_length island sex year
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: FALSE TRUE FALSE TRUE FALSE FALSE FALSE
#> features n_features classif.ce
#> <list> <int> <num>
#> 1: bill_length,flipper_length 2 0.05217391
# all evaluated feature sets
as.data.table(instance$archive)
#> bill_depth bill_length body_mass flipper_length island sex year
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: TRUE FALSE FALSE FALSE FALSE FALSE FALSE
#> 2: FALSE TRUE FALSE FALSE FALSE FALSE FALSE
#> 3: FALSE FALSE TRUE FALSE FALSE FALSE FALSE
#> 4: FALSE FALSE FALSE TRUE FALSE FALSE FALSE
#> 5: FALSE FALSE FALSE FALSE TRUE FALSE FALSE
#> 6: FALSE FALSE FALSE FALSE FALSE TRUE FALSE
#> 7: FALSE FALSE FALSE FALSE FALSE FALSE TRUE
#> 8: TRUE FALSE FALSE TRUE FALSE FALSE FALSE
#> 9: FALSE TRUE FALSE TRUE FALSE FALSE FALSE
#> 10: FALSE FALSE TRUE TRUE FALSE FALSE FALSE
#> 11: FALSE FALSE FALSE TRUE TRUE FALSE FALSE
#> 12: FALSE FALSE FALSE TRUE FALSE TRUE FALSE
#> 13: FALSE FALSE FALSE TRUE FALSE FALSE TRUE
#> classif.ce runtime_learners timestamp batch_nr warnings errors
#> <num> <num> <POSc> <int> <int> <int>
#> 1: 0.26086957 0.005 2024-08-12 16:42:56 1 0 0
#> 2: 0.26086957 0.004 2024-08-12 16:42:56 1 0 0
#> 3: 0.30434783 0.005 2024-08-12 16:42:56 1 0 0
#> 4: 0.17391304 0.005 2024-08-12 16:42:56 1 0 0
#> 5: 0.26086957 0.004 2024-08-12 16:42:56 1 0 0
#> 6: 0.58260870 0.004 2024-08-12 16:42:56 1 0 0
#> 7: 0.58260870 0.004 2024-08-12 16:42:56 1 0 0
#> 8: 0.19130435 0.006 2024-08-12 16:42:56 2 0 0
#> 9: 0.05217391 0.005 2024-08-12 16:42:56 2 0 0
#> 10: 0.15652174 0.005 2024-08-12 16:42:56 2 0 0
#> 11: 0.11304348 0.005 2024-08-12 16:42:56 2 0 0
#> 12: 0.16521739 0.005 2024-08-12 16:42:56 2 0 0
#> 13: 0.16521739 0.003 2024-08-12 16:42:56 2 0 0
#> features n_features resample_result
#> <list> <list> <list>
#> 1: bill_depth 1 <ResampleResult>
#> 2: bill_length 1 <ResampleResult>
#> 3: body_mass 1 <ResampleResult>
#> 4: flipper_length 1 <ResampleResult>
#> 5: island 1 <ResampleResult>
#> 6: sex 1 <ResampleResult>
#> 7: year 1 <ResampleResult>
#> 8: bill_depth,flipper_length 2 <ResampleResult>
#> 9: bill_length,flipper_length 2 <ResampleResult>
#> 10: body_mass,flipper_length 2 <ResampleResult>
#> 11: flipper_length,island 2 <ResampleResult>
#> 12: flipper_length,sex 2 <ResampleResult>
#> 13: flipper_length,year 2 <ResampleResult>
# subset the task and fit the final model
task$select(instance$result_feature_set)
learner$train(task)
# }