
Feature Selection with Random Search
Source:R/FSelectorBatchRandomSearch.R
mlr_fselectors_random_search.RdFeature selection using Random Search Algorithm.
Source
Bergstra J, Bengio Y (2012). “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research, 13(10), 281–305. https://jmlr.csail.mit.edu/papers/v13/bergstra12a.html.
Details
The feature sets are randomly drawn.
The sets are evaluated in batches of size batch_size.
Larger batches mean we can parallelize more, smaller batches imply a more fine-grained checking of termination criteria.
Control Parameters
max_featuresinteger(1)
Maximum number of features. By default, number of features in mlr3::Task.batch_sizeinteger(1)
Maximum number of feature sets to try in a batch.
Super classes
mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> FSelectorBatchRandomSearch
Examples
# Feature Selection
# \donttest{
# retrieve task and load learner
task = tsk("penguins")
learner = lrn("classif.rpart")
# run feature selection on the Palmer Penguins data set
instance = fselect(
fselector = fs("random_search"),
task = task,
learner = learner,
resampling = rsmp("holdout"),
measure = msr("classif.ce"),
term_evals = 10
)
# best performing feature subset
instance$result
#> bill_depth bill_length body_mass flipper_length island sex year
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: TRUE TRUE TRUE TRUE TRUE FALSE FALSE
#> features n_features classif.ce
#> <list> <int> <num>
#> 1: bill_depth,bill_length,body_mass,flipper_length,island 5 0.08695652
# all evaluated feature subsets
as.data.table(instance$archive)
#> bill_depth bill_length body_mass flipper_length island sex year
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: FALSE FALSE FALSE FALSE FALSE TRUE FALSE
#> 2: TRUE TRUE TRUE TRUE TRUE FALSE FALSE
#> 3: TRUE TRUE FALSE TRUE TRUE TRUE TRUE
#> 4: TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> 5: FALSE FALSE TRUE TRUE TRUE TRUE FALSE
#> 6: FALSE TRUE TRUE TRUE TRUE TRUE FALSE
#> 7: FALSE FALSE TRUE FALSE TRUE TRUE TRUE
#> 8: TRUE FALSE TRUE FALSE FALSE FALSE FALSE
#> 9: TRUE FALSE TRUE TRUE TRUE TRUE TRUE
#> 10: TRUE TRUE TRUE TRUE TRUE FALSE TRUE
#> classif.ce runtime_learners timestamp batch_nr warnings errors
#> <num> <num> <POSc> <int> <int> <int>
#> 1: 0.54782609 0.004 2026-03-19 10:30:34 1 0 0
#> 2: 0.08695652 0.005 2026-03-19 10:30:34 1 0 0
#> 3: 0.08695652 0.006 2026-03-19 10:30:34 1 0 0
#> 4: 0.08695652 0.007 2026-03-19 10:30:34 1 0 0
#> 5: 0.25217391 0.006 2026-03-19 10:30:34 1 0 0
#> 6: 0.08695652 0.006 2026-03-19 10:30:34 1 0 0
#> 7: 0.31304348 0.006 2026-03-19 10:30:34 1 0 0
#> 8: 0.26956522 0.005 2026-03-19 10:30:34 1 0 0
#> 9: 0.25217391 0.005 2026-03-19 10:30:34 1 0 0
#> 10: 0.08695652 0.005 2026-03-19 10:30:34 1 0 0
#> features
#> <list>
#> 1: sex
#> 2: bill_depth,bill_length,body_mass,flipper_length,island
#> 3: bill_depth,bill_length,flipper_length,island,sex,year
#> 4: bill_depth,bill_length,body_mass,flipper_length,island,sex,...[7]
#> 5: body_mass,flipper_length,island,sex
#> 6: bill_length,body_mass,flipper_length,island,sex
#> 7: body_mass,island,sex,year
#> 8: bill_depth,body_mass
#> 9: bill_depth,body_mass,flipper_length,island,sex,year
#> 10: bill_depth,bill_length,body_mass,flipper_length,island,year
#> n_features resample_result
#> <list> <list>
#> 1: 1 <ResampleResult>
#> 2: 5 <ResampleResult>
#> 3: 6 <ResampleResult>
#> 4: 7 <ResampleResult>
#> 5: 4 <ResampleResult>
#> 6: 5 <ResampleResult>
#> 7: 4 <ResampleResult>
#> 8: 2 <ResampleResult>
#> 9: 6 <ResampleResult>
#> 10: 6 <ResampleResult>
# subset the task and fit the final model
task$select(instance$result_feature_set)
learner$train(task)
# }