Skip to contents

Shadow variable search creates for each feature a permutated copy and stops when one of them is selected.

The feature selection terminates itself when the first shadow variable is selected. It is not necessary to set a termination criterion.

Source

Thomas J, Hepp T, Mayr A, Bischl B (2017). “Probing for Sparse and Fast Variable Selection with Model-Based Boosting.” Computational and Mathematical Methods in Medicine, 2017, 1--8. doi:10.1155/2017/1421409 .

Wu Y, Boos DD, Stefanski LA (2007). “Controlling Variable Selection by the Addition of Pseudovariables.” Journal of the American Statistical Association, 102(477), 235--243. doi:10.1198/016214506000000843 .

Dictionary

This FSelector can be instantiated via the dictionary mlr_fselectors or with the associated sugar function fs():

mlr_fselectors$get("shadow_variable_search")
fs("shadow_variable_search")

Super class

mlr3fselect::FSelector -> FSelectorShadowVariableSearch

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.`


Method optimization_path()

Returns the optimization path.

Usage

FSelectorShadowVariableSearch$optimization_path(inst)

Arguments

inst

(FSelectInstanceSingleCrit)
Instance optimized with FSelectorShadowVariableSearch.


Method clone()

The objects of this class are cloneable with this method.

Usage

FSelectorShadowVariableSearch$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# retrieve task
task = tsk("pima")

# load learner
learner = lrn("classif.rpart")

# \donttest{
# feature selection on the pima indians diabetes data set
instance = fselect(
  method = "shadow_variable_search",
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
)

# best performing feature subset
instance$result
#>      age glucose insulin  mass pedigree pregnant pressure triceps features
#> 1: FALSE    TRUE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  glucose
#>    classif.ce
#> 1:  0.3320312

# all evaluated feature subsets
as.data.table(instance$archive)
#>       age glucose insulin  mass pedigree pregnant pressure triceps classif.ce
#>  1:  TRUE   FALSE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  0.3945312
#>  2: FALSE    TRUE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  0.3320312
#>  3: FALSE   FALSE    TRUE FALSE    FALSE    FALSE    FALSE   FALSE  0.3906250
#>  4: FALSE   FALSE   FALSE  TRUE    FALSE    FALSE    FALSE   FALSE  0.3789062
#>  5: FALSE   FALSE   FALSE FALSE     TRUE    FALSE    FALSE   FALSE  0.3867188
#>  6: FALSE   FALSE   FALSE FALSE    FALSE     TRUE    FALSE   FALSE  0.4140625
#>  7: FALSE   FALSE   FALSE FALSE    FALSE    FALSE     TRUE   FALSE  0.4335938
#>  8: FALSE   FALSE   FALSE FALSE    FALSE    FALSE    FALSE    TRUE  0.3984375
#>  9: FALSE   FALSE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  0.3984375
#> 10: FALSE   FALSE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  0.3984375
#> 11: FALSE   FALSE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  0.4375000
#> 12: FALSE   FALSE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  0.4023438
#> 13: FALSE   FALSE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  0.3984375
#> 14: FALSE   FALSE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  0.3984375
#> 15: FALSE   FALSE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  0.3984375
#> 16: FALSE   FALSE   FALSE FALSE    FALSE    FALSE    FALSE   FALSE  0.4140625
#>     runtime_learners           timestamp batch_nr permuted__age
#>  1:            0.078 2022-08-25 10:41:09        1         FALSE
#>  2:            0.109 2022-08-25 10:41:09        1         FALSE
#>  3:            0.082 2022-08-25 10:41:09        1         FALSE
#>  4:            0.102 2022-08-25 10:41:09        1         FALSE
#>  5:            0.091 2022-08-25 10:41:09        1         FALSE
#>  6:            0.091 2022-08-25 10:41:09        1         FALSE
#>  7:            0.098 2022-08-25 10:41:09        1         FALSE
#>  8:            0.078 2022-08-25 10:41:09        1         FALSE
#>  9:            0.138 2022-08-25 10:41:09        1          TRUE
#> 10:            0.078 2022-08-25 10:41:09        1         FALSE
#> 11:            0.094 2022-08-25 10:41:09        1         FALSE
#> 12:            0.095 2022-08-25 10:41:09        1         FALSE
#> 13:            0.077 2022-08-25 10:41:09        1         FALSE
#> 14:            0.106 2022-08-25 10:41:09        1         FALSE
#> 15:            0.081 2022-08-25 10:41:09        1         FALSE
#> 16:            0.094 2022-08-25 10:41:09        1         FALSE
#>     permuted__glucose permuted__insulin permuted__mass permuted__pedigree
#>  1:             FALSE             FALSE          FALSE              FALSE
#>  2:             FALSE             FALSE          FALSE              FALSE
#>  3:             FALSE             FALSE          FALSE              FALSE
#>  4:             FALSE             FALSE          FALSE              FALSE
#>  5:             FALSE             FALSE          FALSE              FALSE
#>  6:             FALSE             FALSE          FALSE              FALSE
#>  7:             FALSE             FALSE          FALSE              FALSE
#>  8:             FALSE             FALSE          FALSE              FALSE
#>  9:             FALSE             FALSE          FALSE              FALSE
#> 10:              TRUE             FALSE          FALSE              FALSE
#> 11:             FALSE              TRUE          FALSE              FALSE
#> 12:             FALSE             FALSE           TRUE              FALSE
#> 13:             FALSE             FALSE          FALSE               TRUE
#> 14:             FALSE             FALSE          FALSE              FALSE
#> 15:             FALSE             FALSE          FALSE              FALSE
#> 16:             FALSE             FALSE          FALSE              FALSE
#>     permuted__pregnant permuted__pressure permuted__triceps
#>  1:              FALSE              FALSE             FALSE
#>  2:              FALSE              FALSE             FALSE
#>  3:              FALSE              FALSE             FALSE
#>  4:              FALSE              FALSE             FALSE
#>  5:              FALSE              FALSE             FALSE
#>  6:              FALSE              FALSE             FALSE
#>  7:              FALSE              FALSE             FALSE
#>  8:              FALSE              FALSE             FALSE
#>  9:              FALSE              FALSE             FALSE
#> 10:              FALSE              FALSE             FALSE
#> 11:              FALSE              FALSE             FALSE
#> 12:              FALSE              FALSE             FALSE
#> 13:              FALSE              FALSE             FALSE
#> 14:               TRUE              FALSE             FALSE
#> 15:              FALSE               TRUE             FALSE
#> 16:              FALSE              FALSE              TRUE
#>          resample_result
#>  1: <ResampleResult[21]>
#>  2: <ResampleResult[21]>
#>  3: <ResampleResult[21]>
#>  4: <ResampleResult[21]>
#>  5: <ResampleResult[21]>
#>  6: <ResampleResult[21]>
#>  7: <ResampleResult[21]>
#>  8: <ResampleResult[21]>
#>  9: <ResampleResult[21]>
#> 10: <ResampleResult[21]>
#> 11: <ResampleResult[21]>
#> 12: <ResampleResult[21]>
#> 13: <ResampleResult[21]>
#> 14: <ResampleResult[21]>
#> 15: <ResampleResult[21]>
#> 16: <ResampleResult[21]>

# subset the task and fit the final model
task$select(instance$result_feature_set)
learner$train(task)
# }