Skip to contents

Feature selection using user-defined feature sets.

Details

The feature sets are evaluated in order as given.

The feature selection terminates itself when all feature sets are evaluated. It is not necessary to set a termination criterion.

Dictionary

This FSelector can be instantiated with the associated sugar function fs():

fs("design_points")

Parameters

batch_size

integer(1)
Maximum number of configurations to try in a batch.

design

data.table::data.table
Design points to try in search, one per row.

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.


Method clone()

The objects of this class are cloneable with this method.

Usage

FSelectorBatchDesignPoints$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Feature Selection
# \donttest{

# retrieve task and load learner
task = tsk("pima")
learner = lrn("classif.rpart")

# create design
design = mlr3misc::rowwise_table(
  ~age, ~glucose, ~insulin, ~mass, ~pedigree, ~pregnant, ~pressure, ~triceps,
  TRUE, FALSE,    TRUE,     TRUE,  FALSE,     TRUE,       FALSE,    TRUE,
  TRUE, TRUE,     FALSE,    TRUE,  FALSE,     TRUE,       FALSE,    FALSE,
  TRUE, FALSE,    TRUE,     TRUE,  FALSE,     TRUE,       FALSE,    FALSE,
  TRUE, FALSE,    TRUE,     TRUE,  FALSE,     TRUE,       TRUE,     TRUE
)

# run feature selection on the Pima Indians diabetes data set
instance = fselect(
  fselector = fs("design_points", design = design),
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce")
)

# best performing feature set
instance$result
#>       age glucose insulin   mass pedigree pregnant pressure triceps
#>    <lgcl>  <lgcl>  <lgcl> <lgcl>   <lgcl>   <lgcl>   <lgcl>  <lgcl>
#> 1:   TRUE    TRUE   FALSE   TRUE    FALSE     TRUE    FALSE   FALSE
#>                     features n_features classif.ce
#>                       <list>      <int>      <num>
#> 1: age,glucose,mass,pregnant          4  0.2382812

# all evaluated feature sets
as.data.table(instance$archive)
#>       age glucose insulin   mass pedigree pregnant pressure triceps classif.ce
#>    <lgcl>  <lgcl>  <lgcl> <lgcl>   <lgcl>   <lgcl>   <lgcl>  <lgcl>      <num>
#> 1:   TRUE   FALSE    TRUE   TRUE    FALSE     TRUE    FALSE    TRUE  0.2851562
#> 2:   TRUE    TRUE   FALSE   TRUE    FALSE     TRUE    FALSE   FALSE  0.2382812
#> 3:   TRUE   FALSE    TRUE   TRUE    FALSE     TRUE    FALSE   FALSE  0.2617188
#> 4:   TRUE   FALSE    TRUE   TRUE    FALSE     TRUE     TRUE    TRUE  0.3203125
#>    runtime_learners           timestamp batch_nr warnings errors
#>               <num>              <POSc>    <int>    <int>  <int>
#> 1:            0.008 2025-01-16 10:17:50        1        0      0
#> 2:            0.007 2025-01-16 10:17:50        2        0      0
#> 3:            0.008 2025-01-16 10:17:50        3        0      0
#> 4:            0.009 2025-01-16 10:17:51        4        0      0
#>                                      features n_features  resample_result
#>                                        <list>     <list>           <list>
#> 1:          age,insulin,mass,pregnant,triceps          5 <ResampleResult>
#> 2:                  age,glucose,mass,pregnant          4 <ResampleResult>
#> 3:                  age,insulin,mass,pregnant          4 <ResampleResult>
#> 4: age,insulin,mass,pregnant,pressure,triceps          6 <ResampleResult>

# subset the task and fit the final model
task$select(instance$result_feature_set)
learner$train(task)
# }