Skip to content

stacks 0.2.3

Compare
Choose a tag to compare
@simonpcouch simonpcouch released this 13 May 13:27
· 152 commits to main since this release
03ccbf4

While stacks 0.2.3 is a minor release, it includes a number of significant user experience improvements. This release adds an option to significantly reduce runtime for prediction blending, makes errors and warnings more informative, and greatly reduces the size of reloaded model objects in memory.

Regarding that first point, take a look at how adjusting the times argument in blend_predictions drastically affects its runtime:

library(tidymodels)
library(modeldata)
  
# using a version of the package where `times` is a param
library(stacks)

data("lending_club")

set.seed(1)
lending_club <- sample_n(lending_club, 1000)

folds <- vfold_cv(lending_club, v = 5)

lr_mod <- 
  linear_reg(penalty = tune(), mixture = tune()) %>%
  set_engine("glmnet") %>%
  workflow(
    preprocessor = funded_amnt ~ int_rate + total_bal_il,
    spec = .
  ) %>%
  tune_grid(
    resamples = folds,
    control = control_stack_grid(),
    grid = 10
  )

system.time(
  stacks() %>%
    add_candidates(lr_mod) %>%
    blend_predictions(times = 25)
)
#>    user  system elapsed 
#>  10.280   0.112  10.550

system.time(
  stacks() %>%
    add_candidates(lr_mod) %>%
    blend_predictions(times = 10)
)
#>    user  system elapsed 
#>   4.424   0.050   4.554

system.time(
  stacks() %>%
    add_candidates(lr_mod) %>%
    blend_predictions(times = 4)
)
#>    user  system elapsed 
#>   2.158   0.018   2.194

Related to the second point, there are several different degrees and varieties of tuning "failure" that result in stacks tripping up during model stacking. The package now inspects its inputs more closely and may give you a heads up when you might run into issues later on. Look out for warnings like:

#> Warning message:
#> The inputted `candidates` argument `my_tuning_results` generated notes during tuning/resampling. 
#> Model stacking may fail due to these issues; see `?collect_notes` if so. 

And, finally, related to model stack object size, check out the the results of butcher::weigh on the results like the above reprex (after saving and reloading) before and after this release:

weigh(lr_stack_before)
#> # A tibble: 374 × 2
#>    object                                                              size
#>    <chr>                                                              <dbl>
#>  1 coefs.preproc.terms                                               2.64  
#>  2 coefs.fit.call                                                    2.64  
#>  3 coefs.spec.eng_args.lower.limits                                  2.64  
#>  4 coefs.spec.method.fit.args.lower.limits                           2.64  
#>  5 coefs.spec.method.pred.numeric.post                               1.76  
#>  6 member_fits.lr_mod_1_3.fit.fit.spec.method.pred.numeric.post      1.76  
#>  7 member_fits.lr_mod_1_1.fit.fit.spec.method.pred.numeric.post      1.76  
#>  8 member_fits.lr_mod_1_3.pre.actions.formula.blueprint.mold.process 0.0172
#>  9 member_fits.lr_mod_1_3.pre.mold.blueprint.mold.process            0.0172
#> 10 member_fits.lr_mod_1_1.pre.actions.formula.blueprint.mold.process 0.0172
#> # … with 364 more rows
weigh(lr_stack_after)
#> # A tibble: 374 × 2
#>    object                                                               size
#>    <chr>                                                               <dbl>
#>  1 coefs.preproc.terms                                               24.7   
#>  2 coefs.fit.call                                                    24.7   
#>  3 coefs.spec.eng_args.lower.limits                                  24.7   
#>  4 coefs.spec.method.fit.args.lower.limits                           24.7   
#>  5 coefs.spec.method.pred.numeric.post                                1.79  
#>  6 member_fits.lr_mod_1_3.fit.fit.spec.method.pred.numeric.post       1.79  
#>  7 member_fits.lr_mod_1_1.fit.fit.spec.method.pred.numeric.post       1.79  
#>  8 member_fits.lr_mod_1_3.pre.actions.formula.blueprint.mold.process  0.0172
#>  9 member_fits.lr_mod_1_3.pre.mold.blueprint.mold.process             0.0172
#> 10 member_fits.lr_mod_1_1.pre.actions.formula.blueprint.mold.process  0.0172
#> # … with 364 more rows

Read more about these changes and their implementations at the issues linked below.🐧

Changelog

  • Addressed deprecation warning in add_candidates (#99).
  • Improved clarity of warnings/errors related to failed hyperparameter tuning and resample fitting (#110).
  • Reduced model stack object size and fixed bug where object size of model stack inflated drastically after saving to file (#116). Also, regenerated example objects with this change--saved model objects may need to be regenerated in order to interface with newer versions of the package.
  • Introduced a times argument to blend_predictions that is passed on to rsample::bootstraps when fitting stacking coefficients. Reducing this argument from its default (25) greatly reduces the run time of blend_predictions (#94).
  • The package will now load packages necessary for model fitting at fit_members(), if available, and fail informatively if not (#118).
  • Fixed bug where meta-learner tuning would fail with outcome names and levels including the string "class" (#125).
  • The package will now warn when unused dots are passed to any of the core functions (#127).