Class for Multi Criteria Feature Selection
Source:R/FSelectInstanceBatchMultiCrit.R
FSelectInstanceBatchMultiCrit.Rd
The FSelectInstanceBatchMultiCrit specifies a feature selection problem for a FSelector.
The function fsi()
creates a FSelectInstanceBatchMultiCrit and the function fselect()
creates an instance internally.
Resources
There are several sections about feature selection in the mlr3book.
Learn about multi-objective optimization.
The gallery features a collection of case studies and demos about optimization.
Analysis
For analyzing the feature selection results, it is recommended to pass the archive to as.data.table()
.
The returned data table is joined with the benchmark result which adds the mlr3::ResampleResult for each feature set.
The archive provides various getters (e.g. $learners()
) to ease the access.
All getters extract by position (i
) or unique hash (uhash
).
For a complete list of all getters see the methods section.
The benchmark result ($benchmark_result
) allows to score the feature sets again on a different measure.
Alternatively, measures can be supplied to as.data.table()
.
Super classes
bbotk::OptimInstance
-> bbotk::OptimInstanceBatch
-> bbotk::OptimInstanceBatchMultiCrit
-> FSelectInstanceBatchMultiCrit
Active bindings
result_feature_set
(list of
character()
)
Feature sets for task subsetting.
Methods
Method new()
Creates a new instance of this R6 class.
Usage
FSelectInstanceBatchMultiCrit$new(
task,
learner,
resampling,
measures,
terminator,
store_benchmark_result = TRUE,
store_models = FALSE,
check_values = FALSE,
callbacks = NULL
)
Arguments
task
(mlr3::Task)
Task to operate on.learner
(mlr3::Learner)
Learner to optimize the feature subset for.resampling
(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets. Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits. Already instantiated resamplings are kept unchanged.measures
(list of mlr3::Measure)
Measures to optimize. IfNULL
, mlr3's default measure is used.terminator
(bbotk::Terminator)
Stop criterion of the feature selection.store_benchmark_result
(
logical(1)
)
Store benchmark result in archive?store_models
(
logical(1)
). Store models in benchmark result?check_values
(
logical(1)
)
Check the parameters before the evaluation and the results for validity?callbacks
(list of CallbackBatchFSelect)
List of callbacks.
Method assign_result()
The FSelector object writes the best found feature subsets and estimated performance values here. For internal use.
Arguments
xdt
(
data.table::data.table()
)
x values asdata.table
. Each row is one point. Contains the value in the search space of the FSelectInstanceBatchMultiCrit object. Can contain additional columns for extra information.ydt
(
data.table::data.table()
)
Optimal outcomes, e.g. the Pareto front.
Examples
# Feature selection on Palmer Penguins data set
# \donttest{
task = tsk("penguins")
# Construct feature selection instance
instance = fsi(
task = task,
learner = lrn("classif.rpart"),
resampling = rsmp("cv", folds = 3),
measures = msrs(c("classif.ce", "time_train")),
terminator = trm("evals", n_evals = 4)
)
# Choose optimization algorithm
fselector = fs("random_search", batch_size = 2)
# Run feature selection
fselector$optimize(instance)
#> bill_depth bill_length body_mass flipper_length island sex year
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: TRUE FALSE TRUE TRUE FALSE TRUE TRUE
#> 2: TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> features n_features
#> <list> <int>
#> 1: bill_depth,body_mass,flipper_length,sex,year 5
#> 2: bill_depth,bill_length,body_mass,flipper_length,island,sex,... 5
#> classif.ce time_train
#> <num> <num>
#> 1: 0.2035850 0.002333333
#> 2: 0.0698449 0.003000000
# Optimal feature sets
instance$result_feature_set
#> [[1]]
#> [1] "bill_depth" "body_mass" "flipper_length" "sex"
#> [5] "year"
#>
#> [[2]]
#> [1] "bill_depth" "bill_length" "body_mass" "flipper_length"
#> [5] "island" "sex" "year"
#>
# Inspect all evaluated sets
as.data.table(instance$archive)
#> bill_depth bill_length body_mass flipper_length island sex year
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: TRUE TRUE TRUE TRUE FALSE TRUE TRUE
#> 2: TRUE FALSE TRUE TRUE FALSE TRUE TRUE
#> 3: TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> 4: FALSE TRUE TRUE TRUE FALSE FALSE FALSE
#> classif.ce time_train runtime_learners timestamp batch_nr
#> <num> <num> <num> <POSc> <int>
#> 1: 0.08151538 0.003333333 0.016 2024-09-10 08:26:40 1
#> 2: 0.20358505 0.002333333 0.013 2024-09-10 08:26:40 1
#> 3: 0.06984490 0.003000000 0.015 2024-09-10 08:26:41 2
#> 4: 0.08151538 0.003000000 0.015 2024-09-10 08:26:41 2
#> warnings errors
#> <int> <int>
#> 1: 0 0
#> 2: 0 0
#> 3: 0 0
#> 4: 0 0
#> features n_features
#> <list> <list>
#> 1: bill_depth,bill_length,body_mass,flipper_length,sex,year 6
#> 2: bill_depth,body_mass,flipper_length,sex,year 5
#> 3: bill_depth,bill_length,body_mass,flipper_length,island,sex,... 7
#> 4: bill_length,body_mass,flipper_length 3
#> resample_result
#> <list>
#> 1: <ResampleResult>
#> 2: <ResampleResult>
#> 3: <ResampleResult>
#> 4: <ResampleResult>
# }