Skip to contents

Recursive feature elimination iteratively removes features with a low importance score.

The learner is trained on all features at the start and importance scores are calculated for each feature (see section on optional extractors in Learner). Then the least important feature is removed and the learner is trained on the reduced feature set. The importance scores are calculated again and the procedure is repeated until the desired number of features is reached. The non-recursive option (recursive = FALSE) only uses the importance scores calculated in the first iteration.

The feature selection terminates itself when n_features is reached. It is not necessary to set a termination criterion.

Dictionary

This FSelector can be instantiated via the dictionary mlr_fselectors or with the associated sugar function fs():

mlr_fselectors$get("rfe")
fs("rfe")

Parameters

n_features

integer(1)
The number of features to select. By default half of the features are selected.

feature_fraction

double(1)
Fraction of features to retain in each iteration, The default 0.5 retrains half of the features.

feature_number

integer(1)
Number of features to remove in each iteration.

subset_sizes

integer()
Vector of number of features to retain in each iteration. Must be sorted in decreasing order.

recursive

logical(1)
If TRUE (default), the feature importance is calculated in each iteration.

The parameter feature_fraction, feature_number and subset_sizes are mutually exclusive.

Super class

mlr3fselect::FSelector -> FSelectorRFE

Public fields

importance

numeric()
Stores the feature importance of the model with all variables if recursive is set to FALSE

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage


Method clone()

The objects of this class are cloneable with this method.

Usage

FSelectorRFE$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# retrieve task
task = tsk("pima")

# load learner
learner = lrn("classif.rpart")

# \donttest{
# feature selection on the pima indians diabetes data set
instance = fselect(
  method = "rfe",
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  store_models = TRUE
)

# best performing feature subset
instance$result
#>     age glucose insulin mass pedigree pregnant pressure triceps
#> 1: TRUE    TRUE    TRUE TRUE     TRUE     TRUE     TRUE    TRUE
#>                                          features classif.ce
#> 1: age,glucose,insulin,mass,pedigree,pregnant,...  0.2617188

# all evaluated feature subsets
as.data.table(instance$archive)
#>     age glucose insulin mass pedigree pregnant pressure triceps classif.ce
#> 1: TRUE    TRUE    TRUE TRUE     TRUE     TRUE     TRUE    TRUE  0.2617188
#> 2: TRUE    TRUE   FALSE TRUE     TRUE    FALSE    FALSE   FALSE  0.2617188
#>    runtime_learners           timestamp batch_nr
#> 1:            0.103 2022-08-25 10:41:01        1
#> 2:            0.091 2022-08-25 10:41:02        2
#>                                                         importance
#> 1: 57.430135,14.047061,12.133200, 8.675049, 7.561779, 3.360167,...
#> 2:                             57.57762,14.89596,14.48716,11.46111
#>         resample_result
#> 1: <ResampleResult[21]>
#> 2: <ResampleResult[21]>

# subset the task and fit the final model
task$select(instance$result_feature_set)
learner$train(task)
# }