Feature selection using the Recursive Feature Elimination Algorithm (RFE). Recursive feature elimination iteratively removes features with a low importance score. Only works with Learners that can calculate importance scores (see section on optional extractors in Learner).
Details
The learner is trained on all features at the start and importance scores are calculated for each feature .
Then the least important feature is removed and the learner is trained on the reduced feature set.
The importance scores are calculated again and the procedure is repeated until the desired number of features is reached.
The non-recursive option (recursive = FALSE
) only uses the importance scores calculated in the first iteration.
The feature selection terminates itself when n_features
is reached.
It is not necessary to set a termination criterion.
Control Parameters
n_features
integer(1)
The number of features to select. By default half of the features are selected.feature_fraction
double(1)
Fraction of features to retain in each iteration. The default 0.5 retrains half of the features.feature_number
integer(1)
Number of features to remove in each iteration.subset_sizes
integer()
Vector of number of features to retain in each iteration. Must be sorted in decreasing order.recursive
logical(1)
IfTRUE
(default), the feature importance is calculated in each iteration.
The parameter feature_fraction
, feature_number
and subset_sizes
are mutually exclusive.
Super class
mlr3fselect::FSelector
-> FSelectorRFE
Public fields
importance
numeric()
Stores the feature importance of the model with all variables ifrecursive
is set toFALSE
Examples
# Feature Selection
# \donttest{
# retrieve task and load learner
task = tsk("penguins")
learner = lrn("classif.rpart")
# run feature selection on the Palmer Penguins data set
instance = fselect(
method = fs("rfe"),
task = task,
learner = learner,
resampling = rsmp("holdout"),
measure = msr("classif.ce"),
store_models = TRUE
)
# best performing feature subset
instance$result
#> bill_depth bill_length body_mass flipper_length island sex year
#> 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> features classif.ce
#> 1: bill_depth,bill_length,body_mass,flipper_length,island,sex,... 0.04347826
# all evaluated feature subsets
as.data.table(instance$archive)
#> bill_depth bill_length body_mass flipper_length island sex year
#> 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> 2: TRUE TRUE FALSE TRUE FALSE FALSE FALSE
#> classif.ce runtime_learners timestamp batch_nr warnings errors
#> 1: 0.04347826 0.008 2023-01-26 18:34:02 1 0 0
#> 2: 0.04347826 0.007 2023-01-26 18:34:03 2 0 0
#> importance
#> 1: 86.18556,78.19177,61.61354,57.01552,42.30183, 0.00000,...
#> 2: 86.18556,78.19177,61.61354
#> resample_result
#> 1: <ResampleResult[21]>
#> 2: <ResampleResult[21]>
# subset the task and fit the final model
task$select(instance$result_feature_set)
learner$train(task)
# }