Feature Selection with Recursive Feature Elimination

Feature selection using the Recursive Feature Elimination (RFE) algorithm. Recursive feature elimination iteratively removes features with a low importance score. Only works with mlr3::Learners that can calculate importance scores (see the section on optional extractors in mlr3::Learner).

Source

Guyon I, Weston J, Barnhill S, Vapnik V (2002). “Gene Selection for Cancer Classification using Support Vector Machines.” Machine Learning, 46(1), 389–422. ISSN 1573-0565, doi:10.1023/A:1012487302797 .

Details

The learner is trained on all features at the start and importance scores are calculated for each feature. Then the least important feature is removed and the learner is trained on the reduced feature set. The importance scores are calculated again and the procedure is repeated until the desired number of features is reached. The non-recursive option (recursive = FALSE) only uses the importance scores calculated in the first iteration.

The feature selection terminates itself when n_features is reached. It is not necessary to set a termination criterion.

When using a cross-validation resampling strategy, the importance scores of the resampling iterations are aggregated. The parameter aggregation determines how the importance scores are aggregated. By default ("rank"), the importance score vector of each fold is ranked and the feature with the lowest average rank is removed. The option "mean" averages the score of each feature across the resampling iterations and removes the feature with the lowest average score. Averaging the scores is not appropriate for most importance measures.

Resources

The gallery features a collection of case studies and demos about optimization.

Utilize the built-in feature importance of models with Recursive Feature Elimination.

Dictionary

This FSelector can be instantiated with the associated sugar function fs():

fs("rfe")

Control Parameters

n_features: integer(1)
The minimum number of features to select, by default half of the features.
feature_fraction: double(1)
Fraction of features to retain in each iteration. The default of 0.5 retains half of the features.
feature_number: integer(1)
Number of features to remove in each iteration.
subset_sizes: integer()
Vector of the number of features to retain in each iteration. Must be sorted in decreasing order.
recursive: logical(1)
If TRUE (default), the feature importance is calculated in each iteration.
aggregation: character(1)
The aggregation method for the importance scores of the resampling iterations. See details.

The parameter feature_fraction, feature_number and subset_sizes are mutually exclusive.

Super classes

mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> FSelectorBatchRFE

Methods

Inherited methods

Method `new()`

Creates a new instance of this R6 class.

Usage

FSelectorBatchRFE$new()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

FSelectorBatchRFE$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

# Feature Selection
# \donttest{

# retrieve task and load learner
task = tsk("penguins")
learner = lrn("classif.rpart")

# run feature selection on the Palmer Penguins data set
instance = fselect(
  fselector = fs("rfe"),
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  store_models = TRUE
)

# best performing feature subset
instance$result
#>    bill_depth bill_length body_mass flipper_length island    sex   year
#>        <lgcl>      <lgcl>    <lgcl>         <lgcl> <lgcl> <lgcl> <lgcl>
#> 1:       TRUE        TRUE     FALSE           TRUE  FALSE  FALSE  FALSE
#>    importance                              features n_features classif.ce
#>        <list>                                <list>      <int>      <num>
#> 1:      3,2,1 bill_depth,bill_length,flipper_length          3 0.07826087

# all evaluated feature subsets
as.data.table(instance$archive)
#>    bill_depth bill_length body_mass flipper_length island    sex   year
#>        <lgcl>      <lgcl>    <lgcl>         <lgcl> <lgcl> <lgcl> <lgcl>
#> 1:       TRUE        TRUE      TRUE           TRUE   TRUE   TRUE   TRUE
#> 2:       TRUE        TRUE     FALSE           TRUE  FALSE  FALSE  FALSE
#>    classif.ce runtime_learners           timestamp batch_nr warnings errors
#>         <num>            <num>              <POSc>    <int>    <int>  <int>
#> 1: 0.07826087            0.007 2025-01-16 10:17:55        1        0      0
#> 2: 0.07826087            0.007 2025-01-16 10:17:55        2        0      0
#>         importance
#>             <list>
#> 1: 7,6,5,4,3,2,...
#> 2:           3,2,1
#>                                                          features n_features
#>                                                            <list>     <list>
#> 1: bill_depth,bill_length,body_mass,flipper_length,island,sex,...          7
#> 2:                          bill_depth,bill_length,flipper_length          3
#>     resample_result
#>              <list>
#> 1: <ResampleResult>
#> 2: <ResampleResult>

# subset the task and fit the final model
task$select(instance$result_feature_set)
learner$train(task)
# }

Source

Details

Archive

Resources

Dictionary

Control Parameters

See also

Super classes

Methods

Public methods

Method `new()`

Usage

Method `clone()`

Usage

Arguments

Examples

Feature Selection with Recursive Feature Elimination

Source

Details

Archive

Resources

Dictionary

Control Parameters

See also

Super classes

Methods

Public methods

Method new()

Usage

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `clone()`