Feature Selection with Recursive Feature Elimination with Cross Validation

Feature selection using the Recursive Feature Elimination with Cross-Validation (RFE-CV) algorithm. See FSelectorBatchRFE for a description of the base algorithm. RFE-CV runs a recursive feature elimination in each iteration of a cross-validation to determine the optimal number of features. Then a recursive feature elimination is run again on the complete dataset with the optimal number of features as the final feature set size. The performance of the optimal feature set is calculated on the complete data set and should not be reported as the performance of the final model. Only works with mlr3::Learners that can calculate importance scores (see the section on optional extractors in mlr3::Learner).

Details

The resampling strategy is changed during the feature selection. The resampling strategy passed to the instance (resampling) is used to determine the optimal number of features. Usually, a cross-validation strategy is used and a recursive feature elimination is run in each iteration of the cross-validation. Internally, mlr3::ResamplingCustom is used to emulate this part of the algorithm. In the final recursive feature elimination run the resampling strategy is changed to mlr3::ResamplingInsample i.e. the complete data set is used for training and testing.

The feature selection terminates itself when the optimal number of features is reached. It is not necessary to set a termination criterion.

Resources

The gallery features a collection of case studies and demos about optimization.

Utilize the built-in feature importance of models with Recursive Feature Elimination.

Dictionary

This FSelector can be instantiated with the associated sugar function fs():

fs("rfe")

Control Parameters

n_features: integer(1)
The number of features to select. By default half of the features are selected.
feature_fraction: double(1)
Fraction of features to retain in each iteration. The default 0.5 retrains half of the features.
feature_number: integer(1)
Number of features to remove in each iteration.
subset_sizes: integer()
Vector of number of features to retain in each iteration. Must be sorted in decreasing order.
recursive: logical(1)
If TRUE (default), the feature importance is calculated in each iteration.

The parameter feature_fraction, feature_number and subset_sizes are mutually exclusive.

Super classes

mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> FSelectorBatchRFECV

Methods

Inherited methods

Method `new()`

Creates a new instance of this R6 class.

Usage

FSelectorBatchRFECV$new()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

FSelectorBatchRFECV$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

# Feature Selection
# \donttest{

# retrieve task and load learner
task = tsk("penguins")
learner = lrn("classif.rpart")

# run feature selection on the Palmer Penguins data set
instance = fselect(
  fselector = fs("rfecv"),
  task = task,
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.ce"),
  store_models = TRUE
)

# best performing feature subset
instance$result
#>    bill_depth bill_length body_mass flipper_length island    sex   year
#>        <lgcl>      <lgcl>    <lgcl>         <lgcl> <lgcl> <lgcl> <lgcl>
#> 1:       TRUE        TRUE     FALSE           TRUE  FALSE  FALSE  FALSE
#>                                 features n_features classif.ce
#>                                   <list>      <int>      <num>
#> 1: bill_depth,bill_length,flipper_length          3  0.0377907

# all evaluated feature subsets
as.data.table(instance$archive)
#>    bill_depth bill_length body_mass flipper_length island    sex   year
#>        <lgcl>      <lgcl>    <lgcl>         <lgcl> <lgcl> <lgcl> <lgcl>
#> 1:       TRUE        TRUE      TRUE           TRUE   TRUE   TRUE   TRUE
#> 2:       TRUE        TRUE      TRUE           TRUE   TRUE   TRUE   TRUE
#> 3:       TRUE        TRUE      TRUE           TRUE   TRUE   TRUE   TRUE
#> 4:       TRUE        TRUE     FALSE           TRUE  FALSE  FALSE  FALSE
#> 5:       TRUE        TRUE     FALSE           TRUE  FALSE  FALSE  FALSE
#> 6:      FALSE        TRUE      TRUE           TRUE  FALSE  FALSE  FALSE
#> 7:       TRUE        TRUE      TRUE           TRUE   TRUE   TRUE   TRUE
#> 8:       TRUE        TRUE     FALSE           TRUE  FALSE  FALSE  FALSE
#>    classif.ce runtime_learners           timestamp batch_nr warnings errors
#>         <num>            <num>              <POSc>    <int>    <int>  <int>
#> 1: 0.06956522            0.006 2025-07-10 08:48:21        1        0      0
#> 2: 0.05217391            0.006 2025-07-10 08:48:21        1        0      0
#> 3: 0.08771930            0.006 2025-07-10 08:48:21        1        0      0
#> 4: 0.07826087            0.005 2025-07-10 08:48:21        2        0      0
#> 5: 0.05217391            0.005 2025-07-10 08:48:21        2        0      0
#> 6: 0.10526316            0.008 2025-07-10 08:48:21        2        0      0
#> 7: 0.03488372            0.006 2025-07-10 08:48:21        3        0      0
#> 8: 0.03779070            0.005 2025-07-10 08:48:22        4        0      0
#>                                                         importance iteration
#>                                                             <list>     <int>
#> 1:       83.55208,79.41274,60.83833,56.29083,52.08027, 0.00000,...         1
#> 2:       87.52508,84.04041,75.77964,61.37099,52.26846, 0.00000,...         2
#> 3:       95.67212,86.23182,81.15089,79.80015,75.09393, 0.00000,...         3
#> 4:                                      83.55208,79.41274,60.83833         1
#> 5:                                      87.52508,84.04041,75.77964         2
#> 6:                                      94.99631,85.32808,73.19035         3
#> 7: 124.20793,121.52400,102.74919, 87.26186, 78.61700,  0.00000,...        NA
#> 8:                                      124.2079,121.5240,104.2507        NA
#>                                                          features n_features
#>                                                            <list>     <list>
#> 1: bill_depth,bill_length,body_mass,flipper_length,island,sex,...          7
#> 2: bill_depth,bill_length,body_mass,flipper_length,island,sex,...          7
#> 3: bill_depth,bill_length,body_mass,flipper_length,island,sex,...          7
#> 4:                          bill_depth,bill_length,flipper_length          3
#> 5:                          bill_depth,bill_length,flipper_length          3
#> 6:                           bill_length,body_mass,flipper_length          3
#> 7: bill_depth,bill_length,body_mass,flipper_length,island,sex,...          7
#> 8:                          bill_depth,bill_length,flipper_length          3
#>     resample_result
#>              <list>
#> 1: <ResampleResult>
#> 2: <ResampleResult>
#> 3: <ResampleResult>
#> 4: <ResampleResult>
#> 5: <ResampleResult>
#> 6: <ResampleResult>
#> 7: <ResampleResult>
#> 8: <ResampleResult>

# subset the task and fit the final model
task$select(instance$result_feature_set)
learner$train(task)
# }

Feature Selection with Recursive Feature Elimination with Cross Validation

Details

Archive

Resources

Dictionary

Control Parameters

See also

Super classes

Methods

Public methods

Method `new()`

Usage

Method `clone()`

Usage

Arguments

Examples

Feature Selection with Recursive Feature Elimination with Cross Validation

Details

Archive

Resources

Dictionary

Control Parameters

See also

Super classes

Methods

Public methods

Method new()

Usage

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `clone()`