Skip to contents

The AutoFSelector wraps a mlr3::Learner and augments it with an automatic feature selection. The auto_fselector() function creates an AutoFSelector object.

Details

The AutoFSelector is a mlr3::Learner which wraps another mlr3::Learner and performs the following steps during $train():

  1. The wrapped (inner) learner is trained on the feature subsets via resampling. The feature selection can be specified by providing a FSelector, a bbotk::Terminator, a mlr3::Resampling and a mlr3::Measure.

  2. A final model is fit on the complete training data with the best-found feature subset.

During $predict() the AutoFSelector just calls the predict method of the wrapped (inner) learner.

Resources

There are several sections about feature selection in the mlr3book.

  • Estimate Model Performance with nested resampling (Tuning workflow is transferable to feature selection).

  • Automate the feature selection.

The gallery features a collection of case studies and demos about optimization.

Nested Resampling

Nested resampling can be performed by passing an AutoFSelector object to mlr3::resample() or mlr3::benchmark(). To access the inner resampling results, set store_fselect_instance = TRUE and execute mlr3::resample() or mlr3::benchmark() with store_models = TRUE (see examples). The mlr3::Resampling passed to the AutoFSelector is meant to be the inner resampling, operating on the training set of an arbitrary outer resampling. For this reason it is not feasible to pass an instantiated mlr3::Resampling here.

Super class

mlr3::Learner -> AutoFSelector

Public fields

instance_args

(list())
All arguments from construction to create the FSelectInstanceSingleCrit.

fselector

(FSelector)
Optimization algorithm.

Active bindings

archive

([ArchiveFSelect)
Returns FSelectInstanceSingleCrit archive.

learner

(mlr3::Learner)
Trained learner.

fselect_instance

(FSelectInstanceSingleCrit)
Internally created feature selection instance with all intermediate results.

fselect_result

(data.table::data.table)
Short-cut to $result from FSelectInstanceSingleCrit.

predict_type

(character(1))
Stores the currently active predict type, e.g. "response". Must be an element of $predict_types.

hash

(character(1))
Hash (unique identifier) for this object.

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage

AutoFSelector$new(
  fselector,
  learner,
  resampling,
  measure = NULL,
  terminator,
  store_fselect_instance = TRUE,
  store_benchmark_result = TRUE,
  store_models = FALSE,
  check_values = FALSE,
  callbacks = list()
)

Arguments

fselector

(FSelector)
Optimization algorithm.

learner

(mlr3::Learner)
Learner to optimize the feature subset for.

resampling

(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets. Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits. Already instantiated resamplings are kept unchanged.

measure

(mlr3::Measure)
Measure to optimize. If NULL, default measure is used.

terminator

(Terminator)
Stop criterion of the feature selection.

store_fselect_instance

(logical(1))
If TRUE (default), stores the internally created FSelectInstanceSingleCrit with all intermediate results in slot $fselect_instance. Is set to TRUE, if store_models = TRUE

store_benchmark_result

(logical(1))
Store benchmark result in archive?

store_models

(logical(1)). Store models in benchmark result?

check_values

(logical(1))
Check the parameters before the evaluation and the results for validity?

callbacks

(list of CallbackFSelect)
List of callbacks.


Method base_learner()

Extracts the base learner from nested learner objects like GraphLearner in mlr3pipelines. If recursive = 0, the (tuned) learner is returned.

Usage

AutoFSelector$base_learner(recursive = Inf)

Arguments

recursive

(integer(1))
Depth of recursion for multiple nested objects.

Returns

Learner.


Method importance()

The importance scores of the final model.

Usage

AutoFSelector$importance()

Returns

Named numeric().


Method selected_features()

The selected features of the final model. These features are selected internally by the learner.

Usage

AutoFSelector$selected_features()

Returns

character().


Method oob_error()

The out-of-bag error of the final model.

Usage

AutoFSelector$oob_error()

Returns

numeric(1).


Method loglik()

The log-likelihood of the final model.

Usage

AutoFSelector$loglik()

Returns

logLik. Printer.


Method print()

Usage

AutoFSelector$print()

Arguments

...

(ignored).


Method clone()

The objects of this class are cloneable with this method.

Usage

AutoFSelector$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Automatic Feature Selection
# \donttest{

# split to train and external set
task = tsk("penguins")
split = partition(task, ratio = 0.8)

# create auto fselector
afs = auto_fselector(
  fselector = fs("random_search"),
  learner = lrn("classif.rpart"),
  resampling = rsmp ("holdout"),
  measure = msr("classif.ce"),
  term_evals = 4)

# optimize feature subset and fit final model
afs$train(task, row_ids = split$train)

# predict with final model
afs$predict(task, row_ids = split$test)
#> <PredictionClassif> for 69 observations:
#>     row_ids     truth  response
#>           4    Adelie    Adelie
#>          10    Adelie    Adelie
#>          11    Adelie    Adelie
#> ---                            
#>         334 Chinstrap Chinstrap
#>         337 Chinstrap Chinstrap
#>         338 Chinstrap Chinstrap

# show result
afs$fselect_result
#>    bill_depth bill_length body_mass flipper_length island   sex year
#> 1:       TRUE        TRUE      TRUE           TRUE  FALSE FALSE TRUE
#>                                                features classif.ce
#> 1: bill_depth,bill_length,body_mass,flipper_length,year 0.05434783

# model slot contains trained learner and fselect instance
afs$model
#> $learner
#> <LearnerClassifRpart:classif.rpart>: Classification Tree
#> * Model: rpart
#> * Parameters: xval=0
#> * Packages: mlr3, rpart
#> * Predict Types:  [response], prob
#> * Feature Types: logical, integer, numeric, factor, ordered
#> * Properties: importance, missings, multiclass, selected_features,
#>   twoclass, weights
#> 
#> $features
#> [1] "bill_depth"     "bill_length"    "body_mass"      "flipper_length"
#> [5] "year"          
#> 
#> $fselect_instance
#> <FSelectInstanceSingleCrit>
#> * State:  Optimized
#> * Objective: <ObjectiveFSelect:classif.rpart_on_penguins>
#> * Terminator: <TerminatorEvals>
#> * Result:
#>    bill_depth bill_length body_mass flipper_length island   sex year classif.ce
#> 1:       TRUE        TRUE      TRUE           TRUE  FALSE FALSE TRUE 0.05434783
#> * Archive:
#>    bill_depth bill_length body_mass flipper_length island   sex  year
#> 1:       TRUE        TRUE      TRUE           TRUE  FALSE FALSE  TRUE
#> 2:       TRUE        TRUE      TRUE           TRUE   TRUE  TRUE  TRUE
#> 3:       TRUE        TRUE      TRUE           TRUE   TRUE  TRUE  TRUE
#> 4:       TRUE        TRUE     FALSE           TRUE   TRUE FALSE FALSE
#>    classif.ce
#> 1: 0.05434783
#> 2: 0.05434783
#> 3: 0.05434783
#> 4: 0.05434783
#> 

# shortcut trained learner
afs$learner
#> <LearnerClassifRpart:classif.rpart>: Classification Tree
#> * Model: rpart
#> * Parameters: xval=0
#> * Packages: mlr3, rpart
#> * Predict Types:  [response], prob
#> * Feature Types: logical, integer, numeric, factor, ordered
#> * Properties: importance, missings, multiclass, selected_features,
#>   twoclass, weights

# shortcut fselect instance
afs$fselect_instance
#> <FSelectInstanceSingleCrit>
#> * State:  Optimized
#> * Objective: <ObjectiveFSelect:classif.rpart_on_penguins>
#> * Terminator: <TerminatorEvals>
#> * Result:
#>    bill_depth bill_length body_mass flipper_length island   sex year classif.ce
#> 1:       TRUE        TRUE      TRUE           TRUE  FALSE FALSE TRUE 0.05434783
#> * Archive:
#>    bill_depth bill_length body_mass flipper_length island   sex  year
#> 1:       TRUE        TRUE      TRUE           TRUE  FALSE FALSE  TRUE
#> 2:       TRUE        TRUE      TRUE           TRUE   TRUE  TRUE  TRUE
#> 3:       TRUE        TRUE      TRUE           TRUE   TRUE  TRUE  TRUE
#> 4:       TRUE        TRUE     FALSE           TRUE   TRUE FALSE FALSE
#>    classif.ce
#> 1: 0.05434783
#> 2: 0.05434783
#> 3: 0.05434783
#> 4: 0.05434783


# Nested Resampling

afs = auto_fselector(
  fselector = fs("random_search"),
  learner = lrn("classif.rpart"),
  resampling = rsmp ("holdout"),
  measure = msr("classif.ce"),
  term_evals = 4)

resampling_outer = rsmp("cv", folds = 3)
rr = resample(task, afs, resampling_outer, store_models = TRUE)

# retrieve inner feature selection results.
extract_inner_fselect_results(rr)
#>    iteration bill_depth bill_length body_mass flipper_length island   sex year
#> 1:         1      FALSE        TRUE      TRUE           TRUE   TRUE  TRUE TRUE
#> 2:         2       TRUE        TRUE      TRUE           TRUE   TRUE FALSE TRUE
#> 3:         3      FALSE        TRUE      TRUE           TRUE   TRUE  TRUE TRUE
#>    classif.ce                                                    features
#> 1: 0.07894737        bill_length,body_mass,flipper_length,island,sex,year
#> 2: 0.10526316 bill_depth,bill_length,body_mass,flipper_length,island,year
#> 3: 0.05194805        bill_length,body_mass,flipper_length,island,sex,year
#>     task_id              learner_id resampling_id
#> 1: penguins classif.rpart.fselector            cv
#> 2: penguins classif.rpart.fselector            cv
#> 3: penguins classif.rpart.fselector            cv

# performance scores estimated on the outer resampling
rr$score()
#>                 task  task_id             learner              learner_id
#> 1: <TaskClassif[50]> penguins <AutoFSelector[46]> classif.rpart.fselector
#> 2: <TaskClassif[50]> penguins <AutoFSelector[46]> classif.rpart.fselector
#> 3: <TaskClassif[50]> penguins <AutoFSelector[46]> classif.rpart.fselector
#>            resampling resampling_id iteration              prediction
#> 1: <ResamplingCV[20]>            cv         1 <PredictionClassif[20]>
#> 2: <ResamplingCV[20]>            cv         2 <PredictionClassif[20]>
#> 3: <ResamplingCV[20]>            cv         3 <PredictionClassif[20]>
#>    classif.ce
#> 1: 0.05217391
#> 2: 0.07826087
#> 3: 0.09649123

# unbiased performance of the final model trained on the full data set
rr$aggregate()
#> classif.ce 
#>   0.075642 
# }