Skip to contents

Feature selection using the Recursive Feature Elimination Algorithm (RFE). Recursive feature elimination iteratively removes features with a low importance score. Only works with Learners that can calculate importance scores (see section on optional extractors in Learner).

Details

The learner is trained on all features at the start and importance scores are calculated for each feature . Then the least important feature is removed and the learner is trained on the reduced feature set. The importance scores are calculated again and the procedure is repeated until the desired number of features is reached. The non-recursive option (recursive = FALSE) only uses the importance scores calculated in the first iteration.

The feature selection terminates itself when n_features is reached. It is not necessary to set a termination criterion.

Dictionary

This FSelector can be instantiated with the associated sugar function fs():

fs("rfe")

Control Parameters

n_features

integer(1)
The number of features to select. By default half of the features are selected.

feature_fraction

double(1)
Fraction of features to retain in each iteration. The default 0.5 retrains half of the features.

feature_number

integer(1)
Number of features to remove in each iteration.

subset_sizes

integer()
Vector of number of features to retain in each iteration. Must be sorted in decreasing order.

recursive

logical(1)
If TRUE (default), the feature importance is calculated in each iteration.

The parameter feature_fraction, feature_number and subset_sizes are mutually exclusive.

Super class

mlr3fselect::FSelector -> FSelectorRFE

Public fields

importance

numeric()
Stores the feature importance of the model with all variables if recursive is set to FALSE

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage


Method clone()

The objects of this class are cloneable with this method.

Usage

FSelectorRFE$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Feature Selection
# \donttest{

# retrieve task and load learner
task = tsk("penguins")
learner = lrn("classif.rpart")

# run feature selection on the Palmer Penguins data set
instance = fselect(
  method = fs("rfe"),
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  store_models = TRUE
)

# best performing feature subset
instance$result
#>    bill_depth bill_length body_mass flipper_length island  sex year
#> 1:       TRUE        TRUE      TRUE           TRUE   TRUE TRUE TRUE
#>                                                          features classif.ce
#> 1: bill_depth,bill_length,body_mass,flipper_length,island,sex,... 0.05217391

# all evaluated feature subsets
as.data.table(instance$archive)
#>    bill_depth bill_length body_mass flipper_length island   sex  year
#> 1:       TRUE        TRUE      TRUE           TRUE   TRUE  TRUE  TRUE
#> 2:       TRUE        TRUE     FALSE           TRUE  FALSE FALSE FALSE
#>    classif.ce runtime_learners           timestamp batch_nr warnings errors
#> 1: 0.05217391            0.063 2022-11-25 12:09:53        1        0      0
#> 2: 0.05217391            0.059 2022-11-25 12:09:53        2        0      0
#>                                                   importance
#> 1: 81.34133,78.81522,67.33453,58.64788,45.40653, 0.00000,...
#> 2:                                81.34133,78.81522,67.33453
#>         resample_result
#> 1: <ResampleResult[21]>
#> 2: <ResampleResult[21]>

# subset the task and fit the final model
task$select(instance$result_feature_set)
learner$train(task)
# }