Embedded Ensemble Feature Selection — embedded_ensemble

Ensemble feature selection using multiple learners. The ensemble feature selection method is designed to identify the most predictive features from a given dataset by leveraging multiple machine learning models and resampling techniques. Returns an EnsembleFSResult.

Usage

embedded_ensemble_fselect(
  task,
  learners,
  init_resampling,
  measure,
  store_benchmark_result = TRUE
)

Source

Meinshausen, Nicolai, Buhlmann, Peter (2010). “Stability Selection.” Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(4), 417–473. ISSN 1369-7412, doi:10.1111/J.1467-9868.2010.00740.X , 0809.2932.

Hedou, Julien, Maric, Ivana, Bellan, Gregoire, Einhaus, Jakob, Gaudilliere, K. D, Ladant, Xavier F, Verdonk, Franck, Stelzer, A. I, Feyaerts, Dorien, Tsai, S. A, Ganio, A. E, Sabayev, Maximilian, Gillard, Joshua, Amar, Jonas, Cambriel, Amelie, Oskotsky, T. T, Roldan, Alennie, Golob, L. J, Sirota, Marina, Bonham, A. T, Sato, Masaki, Diop, Maigane, Durand, Xavier, Angst, S. M, Stevenson, K. D, Aghaeepour, Nima, Montanari, Andrea, Gaudilliere, Brice (2024). “Discovery of sparse, reliable omic biomarkers with Stabl.” Nature Biotechnology 2024, 1–13. ISSN 1546-1696, doi:10.1038/s41587-023-02033-x , https://www.nature.com/articles/s41587-023-02033-x.

Arguments

task: (mlr3::Task)
Task to operate on.
learners: (list of mlr3::Learner)
The learners to be used for feature selection. All learners must have the selected_features property, i.e. implement embedded feature selection (e.g. regularized models).
init_resampling: (mlr3::Resampling)
The initial resampling strategy of the data, from which each train set will be passed on to the learners and each test set will be used for prediction. Can only be mlr3::ResamplingSubsampling or mlr3::ResamplingBootstrap.
measure: (mlr3::Measure)
The measure used to score each learner on the test sets generated by init_resampling. If NULL, default measure is used.
store_benchmark_result: (logical(1))
Whether to store the benchmark result in EnsembleFSResult or not.

Value

an EnsembleFSResult object.

Details

The method begins by applying an initial resampling technique specified by the user, to create multiple subsamples from the original dataset (train/test splits). This resampling process helps in generating diverse subsets of data for robust feature selection.

For each subsample (train set) generated in the previous step, the method applies learners that support embedded feature selection. These learners are then scored on their ability to predict on the resampled test sets, storing the selected features during training, for each combination of subsample and learner.

Results are stored in an EnsembleFSResult.

Examples

# \donttest{
  eefsr = embedded_ensemble_fselect(
    task = tsk("sonar"),
    learners = lrns(c("classif.rpart", "classif.featureless")),
    init_resampling = rsmp("subsampling", repeats = 5),
    measure = msr("classif.ce")
  )
  eefsr
#> 
#> ── <EnsembleFSResult> with 2 learners and 5 initial resamplings ────────────────
#>     resampling_iteration          learner_id n_features
#>                    <int>              <char>      <int>
#>  1:                    1       classif.rpart          4
#>  2:                    2       classif.rpart          4
#>  3:                    3       classif.rpart          6
#>  4:                    4       classif.rpart          3
#>  5:                    5       classif.rpart          5
#>  6:                    1 classif.featureless          0
#>  7:                    2 classif.featureless          0
#>  8:                    3 classif.featureless          0
#>  9:                    4 classif.featureless          0
#> 10:                    5 classif.featureless          0
# }