The AutoFSelector wraps a mlr3::Learner and augments it with an automatic feature selection.
The auto_fselector()
function creates an AutoFSelector object.
Usage
auto_fselector(
fselector,
learner,
resampling,
measure = NULL,
term_evals = NULL,
term_time = NULL,
terminator = NULL,
store_fselect_instance = TRUE,
store_benchmark_result = TRUE,
store_models = FALSE,
check_values = FALSE,
callbacks = list()
)
Arguments
- fselector
(FSelector)
Optimization algorithm.- learner
(mlr3::Learner)
Learner to optimize the feature subset for.- resampling
(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets. Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits. Already instantiated resamplings are kept unchanged.- measure
(mlr3::Measure)
Measure to optimize. IfNULL
, default measure is used.- term_evals
(
integer(1)
)
Number of allowed evaluations. Ignored ifterminator
is passed.- term_time
(
integer(1)
)
Maximum allowed time in seconds. Ignored ifterminator
is passed.- terminator
(Terminator)
Stop criterion of the feature selection.- store_fselect_instance
(
logical(1)
)
IfTRUE
(default), stores the internally created FSelectInstanceSingleCrit with all intermediate results in slot$fselect_instance
. Is set toTRUE
, ifstore_models = TRUE
- store_benchmark_result
(
logical(1)
)
Store benchmark result in archive?- store_models
(
logical(1)
). Store models in benchmark result?- check_values
(
logical(1)
)
Check the parameters before the evaluation and the results for validity?- callbacks
(list of CallbackFSelect)
List of callbacks.
Details
The AutoFSelector is a mlr3::Learner which wraps another mlr3::Learner and performs the following steps during $train()
:
The wrapped (inner) learner is trained on the feature subsets via resampling. The feature selection can be specified by providing a FSelector, a bbotk::Terminator, a mlr3::Resampling and a mlr3::Measure.
A final model is fit on the complete training data with the best-found feature subset.
During $predict()
the AutoFSelector just calls the predict method of the wrapped (inner) learner.
Resources
There are several sections about feature selection in the mlr3book.
Estimate Model Performance with nested resampling (Tuning workflow is transferable to feature selection).
Automate the feature selection.
The gallery features a collection of case studies and demos about optimization.
Nested Resampling
Nested resampling can be performed by passing an AutoFSelector object to mlr3::resample()
or mlr3::benchmark()
.
To access the inner resampling results, set store_fselect_instance = TRUE
and execute mlr3::resample()
or mlr3::benchmark()
with store_models = TRUE
(see examples).
The mlr3::Resampling passed to the AutoFSelector is meant to be the inner resampling, operating on the training set of an arbitrary outer resampling.
For this reason it is not feasible to pass an instantiated mlr3::Resampling here.
Examples
# Automatic Feature Selection
# \donttest{
# split to train and external set
task = tsk("penguins")
split = partition(task, ratio = 0.8)
# create auto fselector
afs = auto_fselector(
fselector = fs("random_search"),
learner = lrn("classif.rpart"),
resampling = rsmp ("holdout"),
measure = msr("classif.ce"),
term_evals = 4)
# optimize feature subset and fit final model
afs$train(task, row_ids = split$train)
# predict with final model
afs$predict(task, row_ids = split$test)
#> <PredictionClassif> for 69 observations:
#> row_ids truth response
#> 2 Adelie Adelie
#> 3 Adelie Adelie
#> 7 Adelie Adelie
#> ---
#> 340 Chinstrap Gentoo
#> 342 Chinstrap Chinstrap
#> 343 Chinstrap Gentoo
# show result
afs$fselect_result
#> bill_depth bill_length body_mass flipper_length island sex year
#> 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> features classif.ce
#> 1: bill_depth,bill_length,body_mass,flipper_length,island,sex,... 0.02173913
# model slot contains trained learner and fselect instance
afs$model
#> $learner
#> <LearnerClassifRpart:classif.rpart>: Classification Tree
#> * Model: rpart
#> * Parameters: xval=0
#> * Packages: mlr3, rpart
#> * Predict Types: [response], prob
#> * Feature Types: logical, integer, numeric, factor, ordered
#> * Properties: importance, missings, multiclass, selected_features,
#> twoclass, weights
#>
#> $features
#> [1] "bill_depth" "bill_length" "body_mass" "flipper_length"
#> [5] "island" "sex" "year"
#>
#> $fselect_instance
#> <FSelectInstanceSingleCrit>
#> * State: Optimized
#> * Objective: <ObjectiveFSelect:classif.rpart_on_penguins>
#> * Terminator: <TerminatorEvals>
#> * Result:
#> bill_depth bill_length body_mass flipper_length island sex year classif.ce
#> 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE 0.02173913
#> * Archive:
#> bill_depth bill_length body_mass flipper_length island sex year
#> 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> 2: TRUE TRUE FALSE FALSE FALSE FALSE FALSE
#> 3: FALSE FALSE FALSE FALSE FALSE TRUE FALSE
#> 4: FALSE TRUE FALSE FALSE FALSE TRUE FALSE
#> classif.ce
#> 1: 0.02173913
#> 2: 0.08695652
#> 3: 0.59782609
#> 4: 0.32608696
#>
# shortcut trained learner
afs$learner
#> <LearnerClassifRpart:classif.rpart>: Classification Tree
#> * Model: rpart
#> * Parameters: xval=0
#> * Packages: mlr3, rpart
#> * Predict Types: [response], prob
#> * Feature Types: logical, integer, numeric, factor, ordered
#> * Properties: importance, missings, multiclass, selected_features,
#> twoclass, weights
# shortcut fselect instance
afs$fselect_instance
#> <FSelectInstanceSingleCrit>
#> * State: Optimized
#> * Objective: <ObjectiveFSelect:classif.rpart_on_penguins>
#> * Terminator: <TerminatorEvals>
#> * Result:
#> bill_depth bill_length body_mass flipper_length island sex year classif.ce
#> 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE 0.02173913
#> * Archive:
#> bill_depth bill_length body_mass flipper_length island sex year
#> 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> 2: TRUE TRUE FALSE FALSE FALSE FALSE FALSE
#> 3: FALSE FALSE FALSE FALSE FALSE TRUE FALSE
#> 4: FALSE TRUE FALSE FALSE FALSE TRUE FALSE
#> classif.ce
#> 1: 0.02173913
#> 2: 0.08695652
#> 3: 0.59782609
#> 4: 0.32608696
# Nested Resampling
afs = auto_fselector(
fselector = fs("random_search"),
learner = lrn("classif.rpart"),
resampling = rsmp ("holdout"),
measure = msr("classif.ce"),
term_evals = 4)
resampling_outer = rsmp("cv", folds = 3)
rr = resample(task, afs, resampling_outer, store_models = TRUE)
# retrieve inner feature selection results.
extract_inner_fselect_results(rr)
#> iteration bill_depth bill_length body_mass flipper_length island sex year
#> 1: 1 TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> 2: 2 TRUE TRUE TRUE TRUE TRUE FALSE FALSE
#> 3: 3 TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> classif.ce features
#> 1: 0.05263158 bill_depth,bill_length,body_mass,flipper_length,island,sex,...
#> 2: 0.03947368 bill_depth,bill_length,body_mass,flipper_length,island
#> 3: 0.06493506 bill_depth,bill_length,body_mass,flipper_length,island,sex,...
#> task_id learner_id resampling_id
#> 1: penguins classif.rpart.fselector cv
#> 2: penguins classif.rpart.fselector cv
#> 3: penguins classif.rpart.fselector cv
# performance scores estimated on the outer resampling
rr$score()
#> task task_id learner learner_id
#> 1: <TaskClassif[50]> penguins <AutoFSelector[46]> classif.rpart.fselector
#> 2: <TaskClassif[50]> penguins <AutoFSelector[46]> classif.rpart.fselector
#> 3: <TaskClassif[50]> penguins <AutoFSelector[46]> classif.rpart.fselector
#> resampling resampling_id iteration prediction
#> 1: <ResamplingCV[20]> cv 1 <PredictionClassif[20]>
#> 2: <ResamplingCV[20]> cv 2 <PredictionClassif[20]>
#> 3: <ResamplingCV[20]> cv 3 <PredictionClassif[20]>
#> classif.ce
#> 1: 0.08695652
#> 2: 0.05217391
#> 3: 0.06140351
# unbiased performance of the final model trained on the full data set
rr$aggregate()
#> classif.ce
#> 0.06684465
# }