Skip to contents

Ensemble feature selection using multiple learners. The ensemble feature selection method is designed to identify the most informative features from a given dataset by leveraging multiple machine learning models and resampling techniques. Returns an EnsembleFSResult.

Usage

ensemble_fselect(
  fselector,
  task,
  learners,
  init_resampling,
  inner_resampling,
  measure,
  terminator,
  callbacks = NULL,
  store_benchmark_result = TRUE,
  store_models = TRUE
)

Source

Saeys, Yvan, Abeel, Thomas, Van De Peer, Yves (2008). “Robust feature selection using ensemble feature selection techniques.” Machine Learning and Knowledge Discovery in Databases, 5212 LNAI, 313–325. doi:10.1007/978-3-540-87481-2_21 .

Abeel, Thomas, Helleputte, Thibault, Van de Peer, Yves, Dupont, Pierre, Saeys, Yvan (2010). “Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.” Bioinformatics, 26, 392–398. ISSN 1367-4803, doi:10.1093/BIOINFORMATICS/BTP630 .

Pes, Barbara (2020). “Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains.” Neural Computing and Applications, 32(10), 5951–5973. ISSN 14333058, doi:10.1007/s00521-019-04082-3 .

Arguments

fselector

(FSelector)
Optimization algorithm.

task

(mlr3::Task)
Task to operate on.

learners

(list of mlr3::Learner)
The learners to be used for feature selection.

init_resampling

(mlr3::Resampling)
The initial resampling strategy of the data, from which each train set will be passed on to the learners. Can only be mlr3::ResamplingSubsampling or mlr3::ResamplingBootstrap.

inner_resampling

(mlr3::Resampling)
The inner resampling strategy used by the FSelector.

measure

(mlr3::Measure)
Measure to optimize. If NULL, default measure is used.

terminator

(bbotk::Terminator)
Stop criterion of the feature selection.

callbacks

(list of lists of CallbackBatchFSelect)
Callbacks to be used for each learner. The lists must have the same length as the number of learners.

store_benchmark_result

(logical(1))
Whether to store the benchmark result in EnsembleFSResult or not.

store_models

(logical(1))
Whether to store models in auto_fselector or not.

Value

an EnsembleFSResult object.

Details

The method begins by applying an initial resampling technique specified by the user, to create multiple subsamples from the original dataset. This resampling process helps in generating diverse subsets of data for robust feature selection.

For each subsample generated in the previous step, the method performs wrapped-based feature selection (auto_fselector) using each provided learner, the given inner resampling method, performance measure and optimization algorithm. This process generates the best feature subset for each combination of subsample and learner. Results are stored in an EnsembleFSResult.

Examples

# \donttest{
  efsr = ensemble_fselect(
    fselector = fs("random_search"),
    task = tsk("sonar"),
    learners = lrns(c("classif.rpart", "classif.featureless")),
    init_resampling = rsmp("subsampling", repeats = 2),
    inner_resampling = rsmp("cv", folds = 3),
    measure = msr("classif.ce"),
    terminator = trm("evals", n_evals = 10)
  )
  efsr
#> <EnsembleFSResult>
#>    resampling_iteration          learner_id n_features
#>                   <int>              <char>      <int>
#> 1:                    1       classif.rpart          1
#> 2:                    1 classif.featureless          9
#> 3:                    2       classif.rpart         51
#> 4:                    2 classif.featureless          5
# }