Skip to contents

Function to optimize the features of a mlr3::Learner. The function internally creates a FSelectInstanceSingleCrit or FSelectInstanceMultiCrit which describes the feature selection problem. It executes the feature selection with the FSelector (method) and returns the result with the fselect instance ($result). The ArchiveFSelect ($archive) stores all evaluated hyperparameter configurations and performance scores.

Usage

fselect(
  fselector,
  task,
  learner,
  resampling,
  measures = NULL,
  term_evals = NULL,
  term_time = NULL,
  terminator = NULL,
  store_benchmark_result = TRUE,
  store_models = FALSE,
  check_values = FALSE,
  callbacks = list()
)

Arguments

fselector

(FSelector)
Optimization algorithm.

task

(mlr3::Task)
Task to operate on.

learner

(mlr3::Learner)
Learner to optimize the feature subset for.

resampling

(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets. Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits. Already instantiated resamplings are kept unchanged.

measures

(mlr3::Measure or list of mlr3::Measure)
A single measure creates a FSelectInstanceSingleCrit and multiple measures a FSelectInstanceMultiCrit. If NULL, default measure is used.

term_evals

(integer(1))
Number of allowed evaluations. Ignored if terminator is passed.

term_time

(integer(1))
Maximum allowed time in seconds. Ignored if terminator is passed.

terminator

(Terminator)
Stop criterion of the feature selection.

store_benchmark_result

(logical(1))
Store benchmark result in archive?

store_models

(logical(1)). Store models in benchmark result?

check_values

(logical(1))
Check the parameters before the evaluation and the results for validity?

callbacks

(list of CallbackFSelect)
List of callbacks.

Details

The mlr3::Task, mlr3::Learner, mlr3::Resampling, mlr3::Measure and Terminator are used to construct a FSelectInstanceSingleCrit. If multiple performance Measures are supplied, a FSelectInstanceMultiCrit is created. The parameter term_evals and term_time are shortcuts to create a Terminator. If both parameters are passed, a TerminatorCombo is constructed. For other Terminators, pass one with terminator. If no termination criterion is needed, set term_evals, term_time and terminator to NULL.

Resources

There are several sections about feature selection in the mlr3book.

The gallery features a collection of case studies and demos about optimization.

Analysis

For analyzing the feature selection results, it is recommended to pass the archive to as.data.table(). The returned data table is joined with the benchmark result which adds the mlr3::ResampleResult for each feature set.

The archive provides various getters (e.g. $learners()) to ease the access. All getters extract by position (i) or unique hash (uhash). For a complete list of all getters see the methods section.

The benchmark result ($benchmark_result) allows to score the feature sets again on a different measure. Alternatively, measures can be supplied to as.data.table().

Examples

# Feature selection on the Palmer Penguins data set
task = tsk("pima")
learner = lrn("classif.rpart")

# Run feature selection
instance = fselect(
  fselector = fs("random_search"),
  task = task,
  learner = learner,
  resampling = rsmp ("holdout"),
  measures = msr("classif.ce"),
  term_evals = 4)

# Subset task to optimized feature set
task$select(instance$result_feature_set)

# Train the learner with optimal feature set on the full data set
learner$train(task)

# Inspect all evaluated configurations
as.data.table(instance$archive)
#>       age glucose insulin  mass pedigree pregnant pressure triceps classif.ce
#>  1:  TRUE    TRUE    TRUE  TRUE     TRUE     TRUE     TRUE    TRUE  0.2539062
#>  2:  TRUE    TRUE    TRUE  TRUE    FALSE    FALSE    FALSE   FALSE  0.2656250
#>  3: FALSE   FALSE   FALSE FALSE    FALSE    FALSE     TRUE    TRUE  0.3671875
#>  4:  TRUE    TRUE    TRUE  TRUE    FALSE    FALSE     TRUE    TRUE  0.2578125
#>  5: FALSE   FALSE   FALSE FALSE     TRUE     TRUE     TRUE   FALSE  0.3007812
#>  6: FALSE   FALSE    TRUE FALSE    FALSE    FALSE     TRUE   FALSE  0.3828125
#>  7:  TRUE   FALSE    TRUE FALSE    FALSE     TRUE    FALSE    TRUE  0.2890625
#>  8:  TRUE    TRUE    TRUE  TRUE     TRUE     TRUE     TRUE    TRUE  0.2539062
#>  9:  TRUE   FALSE   FALSE FALSE    FALSE     TRUE     TRUE    TRUE  0.3007812
#> 10: FALSE   FALSE   FALSE  TRUE     TRUE     TRUE     TRUE    TRUE  0.3085938
#>     runtime_learners           timestamp batch_nr warnings errors
#>  1:            0.016 2023-07-05 08:07:16        1        0      0
#>  2:            0.015 2023-07-05 08:07:16        1        0      0
#>  3:            0.013 2023-07-05 08:07:16        1        0      0
#>  4:            0.015 2023-07-05 08:07:16        1        0      0
#>  5:            0.015 2023-07-05 08:07:16        1        0      0
#>  6:            0.013 2023-07-05 08:07:16        1        0      0
#>  7:            0.015 2023-07-05 08:07:16        1        0      0
#>  8:            0.014 2023-07-05 08:07:16        1        0      0
#>  9:            0.015 2023-07-05 08:07:16        1        0      0
#> 10:            0.018 2023-07-05 08:07:16        1        0      0
#>                                           features      resample_result
#>  1: age,glucose,insulin,mass,pedigree,pregnant,... <ResampleResult[21]>
#>  2:                       age,glucose,insulin,mass <ResampleResult[21]>
#>  3:                               pressure,triceps <ResampleResult[21]>
#>  4:      age,glucose,insulin,mass,pressure,triceps <ResampleResult[21]>
#>  5:                     pedigree,pregnant,pressure <ResampleResult[21]>
#>  6:                               insulin,pressure <ResampleResult[21]>
#>  7:                   age,insulin,pregnant,triceps <ResampleResult[21]>
#>  8: age,glucose,insulin,mass,pedigree,pregnant,... <ResampleResult[21]>
#>  9:                  age,pregnant,pressure,triceps <ResampleResult[21]>
#> 10:        mass,pedigree,pregnant,pressure,triceps <ResampleResult[21]>