Function to construct a FSelectInstanceBatchSingleCrit or FSelectInstanceBatchMultiCrit.
Usage
fsi(
task,
learner,
resampling,
measures = NULL,
terminator,
store_benchmark_result = TRUE,
store_models = FALSE,
check_values = FALSE,
callbacks = NULL,
ties_method = "least_features"
)
Arguments
- task
(mlr3::Task)
Task to operate on.- learner
(mlr3::Learner)
Learner to optimize the feature subset for.- resampling
(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets. Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits. Already instantiated resamplings are kept unchanged.- measures
(mlr3::Measure or list of mlr3::Measure)
A single measure creates a FSelectInstanceBatchSingleCrit and multiple measures a FSelectInstanceBatchMultiCrit. IfNULL
, default measure is used.- terminator
(bbotk::Terminator)
Stop criterion of the feature selection.- store_benchmark_result
(
logical(1)
)
Store benchmark result in archive?- store_models
(
logical(1)
). Store models in benchmark result?- check_values
(
logical(1)
)
Check the parameters before the evaluation and the results for validity?- callbacks
(list of CallbackBatchFSelect)
List of callbacks.- ties_method
(
character(1)
)
The method to break ties when selecting sets while optimizing and when selecting the best set. Can be"least_features"
or"random"
. The option"least_features"
(default) selects the feature set with the least features. If there are multiple best feature sets with the same number of features, one is selected randomly. Therandom
method returns a random feature set from the best feature sets. Ignored if multiple measures are used.
Resources
There are several sections about feature selection in the mlr3book.
Getting started with wrapper feature selection.
Do a sequential forward selection Palmer Penguins data set.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
Run a feature selection with Shadow Variable Search.
Feature Selection on the Titanic data set.
Default Measures
If no measure is passed, the default measure is used. The default measure depends on the task type.
Task | Default Measure | Package |
"classif" | "classif.ce" | mlr3 |
"regr" | "regr.mse" | mlr3 |
"surv" | "surv.cindex" | mlr3proba |
"dens" | "dens.logloss" | mlr3proba |
"classif_st" | "classif.ce" | mlr3spatial |
"regr_st" | "regr.mse" | mlr3spatial |
"clust" | "clust.dunn" | mlr3cluster |
Examples
# Feature selection on Palmer Penguins data set
# \donttest{
task = tsk("penguins")
learner = lrn("classif.rpart")
# Construct feature selection instance
instance = fsi(
task = task,
learner = learner,
resampling = rsmp("cv", folds = 3),
measures = msr("classif.ce"),
terminator = trm("evals", n_evals = 4)
)
# Choose optimization algorithm
fselector = fs("random_search", batch_size = 2)
# Run feature selection
fselector$optimize(instance)
#> bill_depth bill_length body_mass flipper_length island sex year
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> features n_features
#> <list> <int>
#> 1: bill_depth,bill_length,body_mass,flipper_length,island,sex,... 7
#> classif.ce
#> <num>
#> 1: 0.0669972
# Subset task to optimal feature set
task$select(instance$result_feature_set)
# Train the learner with optimal feature set on the full data set
learner$train(task)
# Inspect all evaluated sets
as.data.table(instance$archive)
#> bill_depth bill_length body_mass flipper_length island sex year
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: TRUE FALSE FALSE FALSE TRUE FALSE FALSE
#> 2: FALSE TRUE FALSE TRUE TRUE FALSE FALSE
#> 3: FALSE FALSE TRUE FALSE FALSE FALSE FALSE
#> 4: TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> classif.ce runtime_learners timestamp batch_nr warnings errors
#> <num> <num> <POSc> <int> <int> <int>
#> 1: 0.24424104 0.012 2024-11-07 21:50:12 1 0 0
#> 2: 0.06989575 0.016 2024-11-07 21:50:12 1 0 0
#> 3: 0.29674549 0.014 2024-11-07 21:50:12 2 0 0
#> 4: 0.06699720 0.015 2024-11-07 21:50:12 2 0 0
#> features n_features
#> <list> <list>
#> 1: bill_depth,island 2
#> 2: bill_length,flipper_length,island 3
#> 3: body_mass 1
#> 4: bill_depth,bill_length,body_mass,flipper_length,island,sex,... 7
#> resample_result
#> <list>
#> 1: <ResampleResult>
#> 2: <ResampleResult>
#> 3: <ResampleResult>
#> 4: <ResampleResult>
# }