diff --git a/DESCRIPTION b/DESCRIPTION index 0f291f65..facd690b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: stacks Title: Tidy Model Stacking -Version: 0.1.0.9000 +Version: 0.2.0 Authors@R: c( person(given = "Simon", family = "Couch", diff --git a/NEWS.md b/NEWS.md index 28eea066..1b50be51 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,8 +1,6 @@ # stacks -### v0.1.0.9000 - -Developmental version, to be released as v0.2.0. +### v0.2.0 ### Breaking changes diff --git a/R/example_data.R b/R/example_data.R index 67a071cb..4b549f25 100644 --- a/R/example_data.R +++ b/R/example_data.R @@ -46,8 +46,7 @@ #' #' @source #' Julie Jung et al. (2020) Multimodal mechanosensing enables treefrog -#' embryos to escape egg-predators. -#' \url{https://doi.org/10.1242/jeb.236141} +#' embryos to escape egg-predators. \doi{10.1242/jeb.236141} #' #' @name example_data NULL diff --git a/R/tree_frogs.R b/R/tree_frogs.R index 4083c0cd..e411f099 100644 --- a/R/tree_frogs.R +++ b/R/tree_frogs.R @@ -46,6 +46,5 @@ #' @source #' #' Julie Jung et al. (2020) Multimodal mechanosensing enables treefrog -#' embryos to escape egg-predators. -#' \url{https://doi.org/10.1242/jeb.236141} +#' embryos to escape egg-predators. \doi{10.1242/jeb.236141} "tree_frogs" \ No newline at end of file diff --git a/README.Rmd b/README.Rmd index 091544e7..b8463152 100644 --- a/README.Rmd +++ b/README.Rmd @@ -43,7 +43,7 @@ stacks is generalized with respect to: * Cross-validation scheme: Any resampling algorithm implemented in [rsample](https://rsample.tidymodels.org/) or adjacent packages is fair game for resampling data for use in training a model stack. * Error metric: Any metric function implemented in [yardstick](https://yardstick.tidymodels.org/) or adjacent packages is fair game for evaluating model stacks and their members. That package provides some infrastructure for creating your own metric functions as well! -stacks uses a regularized linear model to combine predictions from ensemble members, though this model type is only one of many possible learning algorithms that could be used to fit a stacked ensemble model. For implementations of additional ensemble learning algorithms, check out [h2o](http://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.stackedEnsemble.html) and [SuperLearner](https://cran.r-project.org/web/packages/SuperLearner/SuperLearner.pdf). +stacks uses a regularized linear model to combine predictions from ensemble members, though this model type is only one of many possible learning algorithms that could be used to fit a stacked ensemble model. For implementations of additional ensemble learning algorithms, check out [h2o](http://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.stackedEnsemble.html) and [SuperLearner](https://CRAN.R-project.org/package=SuperLearner). Rather than diving right into the implementation, we'll focus here on how the pieces fit together, conceptually, in building an ensemble with `stacks`. See the `basics` vignette for an example of the API in action! diff --git a/README.md b/README.md index 035ce581..50671175 100644 --- a/README.md +++ b/README.md @@ -71,7 +71,7 @@ For implementations of additional ensemble learning algorithms, check out [h2o](http://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.stackedEnsemble.html) and -[SuperLearner](https://cran.r-project.org/web/packages/SuperLearner/SuperLearner.pdf). +[SuperLearner](https://CRAN.R-project.org/package=SuperLearner). Rather than diving right into the implementation, we’ll focus here on how the pieces fit together, conceptually, in building an ensemble with diff --git a/cran-comments.md b/cran-comments.md index 224354c4..a8e1b223 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,11 +1,8 @@ -# stacks 0.1.0 - -This submission is a resubmission following a request to adjust a URL with -200 status. +# stacks 0.2.0 ## Test environments -* local macOS install, R 3.6.3 +* local macOS install, R 4.0.3 * macOS (on github actions), release * ubuntu 16.04 (on github actions), release, devel * windows (on github actions), R 3.6.3, release @@ -13,15 +10,7 @@ This submission is a resubmission following a request to adjust a URL with ## R CMD check results -There were 0 ERRORs, 0 WARNINGs, and 0 NOTEs on all systems except -the following NOTE on win-builder: - -``` -* checking CRAN incoming feasibility ... NOTE -Maintainer: 'Simon Couch ' - -New submission -``` +There were no ERRORs, WARNINGs, or NOTEs. ## Reverse dependencies diff --git a/data/class_res_nn.rda b/data/class_res_nn.rda index 4f66186a..5ccc1189 100644 Binary files a/data/class_res_nn.rda and b/data/class_res_nn.rda differ diff --git a/data/class_res_rf.rda b/data/class_res_rf.rda index cace043d..5bd57b03 100644 Binary files a/data/class_res_rf.rda and b/data/class_res_rf.rda differ diff --git a/data/log_res_nn.rda b/data/log_res_nn.rda index b5ae0dde..80c25760 100644 Binary files a/data/log_res_nn.rda and b/data/log_res_nn.rda differ diff --git a/data/log_res_rf.rda b/data/log_res_rf.rda index 31ae70f1..440e8352 100644 Binary files a/data/log_res_rf.rda and b/data/log_res_rf.rda differ diff --git a/data/reg_folds.rda b/data/reg_folds.rda index 51ad8a26..24edff98 100644 Binary files a/data/reg_folds.rda and b/data/reg_folds.rda differ diff --git a/data/reg_res_lr.rda b/data/reg_res_lr.rda index 955dd56f..76f953f2 100644 Binary files a/data/reg_res_lr.rda and b/data/reg_res_lr.rda differ diff --git a/data/reg_res_sp.rda b/data/reg_res_sp.rda index b32f1276..c632c61b 100644 Binary files a/data/reg_res_sp.rda and b/data/reg_res_sp.rda differ diff --git a/data/reg_res_svm.rda b/data/reg_res_svm.rda index 01fb79e6..dcacf7a5 100644 Binary files a/data/reg_res_svm.rda and b/data/reg_res_svm.rda differ diff --git a/data/tree_frogs_reg_test.rda b/data/tree_frogs_reg_test.rda index d6b2e4d0..2b1f85ef 100644 Binary files a/data/tree_frogs_reg_test.rda and b/data/tree_frogs_reg_test.rda differ diff --git a/docs/404.html b/docs/404.html index cf1b148a..703aa1fc 100644 --- a/docs/404.html +++ b/docs/404.html @@ -94,7 +94,7 @@ stacks @@ -102,7 +102,7 @@ diff --git a/docs/CODE_OF_CONDUCT.html b/docs/CODE_OF_CONDUCT.html index e0a02cf0..c7fe2bb4 100644 --- a/docs/CODE_OF_CONDUCT.html +++ b/docs/CODE_OF_CONDUCT.html @@ -94,7 +94,7 @@ stacks @@ -102,7 +102,7 @@ diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index cdfb47d0..a620fa3d 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -94,7 +94,7 @@ stacks @@ -102,7 +102,7 @@ diff --git a/docs/LICENSE.html b/docs/LICENSE.html index 0185db1f..04f65d5c 100644 --- a/docs/LICENSE.html +++ b/docs/LICENSE.html @@ -94,7 +94,7 @@ stacks @@ -102,7 +102,7 @@ diff --git a/docs/articles/basics.html b/docs/articles/basics.html index f27b278d..03bc114a 100644 --- a/docs/articles/basics.html +++ b/docs/articles/basics.html @@ -50,7 +50,7 @@ stacks @@ -58,7 +58,7 @@ @@ -94,14 +99,13 @@ - -
+
@@ -122,29 +126,29 @@

Getting Started With stacks

  • Predict on new data with predict()!
  • The package is closely integrated with the rest of the functionality in tidymodels—we’ll load those packages as well, in addition to some tidyverse packages to evaluate our results later on.

    - +

    In this example, we’ll make use of the tree_frogs data exported with stacks, giving experimental results on hatching behavior of red-eyed tree frog embryos!

    Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to “predator stimulus” by jiggling the embryos with a blunt probe. Beforehand, though some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ.) Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!

    We’ll start out with predicting latency (i.e. time to hatch) based on other attributes. We’ll need to filter out NAs (i.e. cases where the embryo did not hatch) first.

    -
    -data("tree_frogs")
    +
    +data("tree_frogs")
     
     # subset the data
    -tree_frogs <- tree_frogs %>%
    -  filter(!is.na(latency)) %>%
    -  select(-c(clutch, hatched))
    +tree_frogs <- tree_frogs %>% + filter(!is.na(latency)) %>% + select(-c(clutch, hatched))

    Taking a quick look at the data, it seems like the hatch time is pretty closely related to some of our predictors!

    -
    -library(ggplot2)
    +
    +library(ggplot2)
     
    -ggplot(tree_frogs) +
    -  aes(x = age, y = latency, color = treatment) +
    -  geom_point() +
    -  labs(x = "Embryo Age (s)", y = "Time to Hatch (s)", col = "Treatment")
    +ggplot(tree_frogs) + + aes(x = age, y = latency, color = treatment) + + geom_point() + + labs(x = "Embryo Age (s)", y = "Time to Hatch (s)", col = "Treatment")

    Let’s give this a go!

    @@ -157,55 +161,55 @@

  • Each model definition must share the same rsample rset object.
  • We’ll first start out with splitting up the training data, generating resamples, and setting some options that will be used by each model definition.

    -
    -# some setup: resampling and a basic recipe
    -set.seed(1)
    -tree_frogs_split <- initial_split(tree_frogs)
    -tree_frogs_train <- training(tree_frogs_split)
    -tree_frogs_test  <- testing(tree_frogs_split)
    +
    +# some setup: resampling and a basic recipe
    +set.seed(1)
    +tree_frogs_split <- initial_split(tree_frogs)
    +tree_frogs_train <- training(tree_frogs_split)
    +tree_frogs_test  <- testing(tree_frogs_split)
     
    -set.seed(1)
    -folds <- rsample::vfold_cv(tree_frogs_train, v = 5)
    +set.seed(1)
    +folds <- rsample::vfold_cv(tree_frogs_train, v = 5)
     
    -tree_frogs_rec <- 
    -  recipe(latency ~ ., data = tree_frogs_train)
    +tree_frogs_rec <- 
    +  recipe(latency ~ ., data = tree_frogs_train)
     
    -metric <- metric_set(rmse)
    +metric <- metric_set(rmse)

    Tuning and fitting results for use in ensembles need to be fitted with the control arguments save_pred = TRUE and save_workflow = TRUE—these settings ensure that the assessment set predictions, as well as the workflow used to fit the resamples, are stored in the resulting object. For convenience, stacks supplies some control_stack_*() functions to generate the appropriate objects for you.

    In this example, we’ll be working with tune_grid() and fit_resamples() from the tune package, so we will use the following control settings:

    -
    -ctrl_grid <- control_stack_grid()
    -ctrl_res <- control_stack_resamples()
    +
    +ctrl_grid <- control_stack_grid()
    +ctrl_res <- control_stack_resamples()

    We’ll define three different model definitions to try to predict time to hatch—a K-nearest neighbors model (with hyperparameters to tune), a linear model, and a support vector machine model (again, with hyperparameters to tune).

    Starting out with K-nearest neighbors, we begin by creating a parsnip model specification:

    -
    -# create a model definition
    -knn_spec <-
    -  nearest_neighbor(
    -    mode = "regression", 
    -    neighbors = tune("k")
    -  ) %>%
    -  set_engine("kknn")
    -
    -knn_spec
    +
    +# create a model definition
    +knn_spec <-
    +  nearest_neighbor(
    +    mode = "regression", 
    +    neighbors = tune("k")
    +  ) %>%
    +  set_engine("kknn")
    +
    +knn_spec
     #> K-Nearest Neighbor Model Specification (regression)
     #> 
     #> Main Arguments:
     #>   neighbors = tune("k")
     #> 
    -#> Computational engine: kknn
    +#> Computational engine: kknn

    Note that, since we are tuning over several possible numbers of neighbors, this model specification defines multiple model configurations. The specific form of those configurations will be determined when specifying the grid search in tune_grid().

    From here, we extend the basic recipe defined earlier to fully specify the form of the design matrix for use in a K-nearest neighbors model:

    -
    -# extend the recipe
    -knn_rec <-
    -  tree_frogs_rec %>%
    -  step_dummy(all_nominal()) %>%
    -  step_zv(all_predictors(), skip = TRUE) %>%
    -  step_meanimpute(all_numeric(), skip = TRUE) %>%
    -  step_normalize(all_numeric(), skip = TRUE)
    -
    -knn_rec
    +
    +# extend the recipe
    +knn_rec <-
    +  tree_frogs_rec %>%
    +  step_dummy(all_nominal()) %>%
    +  step_zv(all_predictors(), skip = TRUE) %>%
    +  step_meanimpute(all_numeric(), skip = TRUE) %>%
    +  step_normalize(all_numeric(), skip = TRUE)
    +
    +knn_rec
     #> Data Recipe
     #> 
     #> Inputs:
    @@ -219,17 +223,17 @@ 

    #> Dummy variables from all_nominal() #> Zero variance filter on all_predictors() #> Mean Imputation for all_numeric() -#> Centering and scaling for all_numeric()

    +#> Centering and scaling for all_numeric()

    Starting with the basic recipe, we convert categorical variables to dummy variables, remove column with only one observation, impute missing values in numeric variables using the mean, and normalize numeric predictors. Pre-processing instructions for the remaining models are defined similarly.

    Now, we combine the model specification and pre-processing instructions defined above to form a workflow object:

    -
    -# add both to a workflow
    -knn_wflow <- 
    -  workflow() %>% 
    -  add_model(knn_spec) %>%
    -  add_recipe(knn_rec)
    -
    -knn_wflow
    +
    +# add both to a workflow
    +knn_wflow <- 
    +  workflow() %>% 
    +  add_model(knn_spec) %>%
    +  add_recipe(knn_rec)
    +
    +knn_wflow
     #> ══ Workflow ════════════════════════════════════════════════════════════════════
     #> Preprocessor: Recipe
     #> Model: nearest_neighbor()
    @@ -248,155 +252,120 @@ 

    #> Main Arguments: #> neighbors = tune("k") #> -#> Computational engine: kknn

    +#> Computational engine: kknn

    Finally, we can make use of the workflow, training set resamples, metric set, and control object to tune our hyperparameters. Using the grid argument, we specify that we would like to optimize over four possible values of k using a grid search.

    -
    -# tune k and fit to the 5-fold cv
    -set.seed(2020)
    -knn_res <- 
    -  tune_grid(
    -    knn_wflow,
    -    resamples = folds,
    -    metrics = metric,
    -    grid = 4,
    -    control = ctrl_grid
    -  )
    -#> Loading required package: scales
    -#> 
    -#> Attaching package: 'scales'
    -#> The following object is masked from 'package:purrr':
    -#> 
    -#>     discard
    -#> 
    -#> Attaching package: 'rlang'
    -#> The following objects are masked from 'package:purrr':
    -#> 
    -#>     %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
    -#>     flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
    -#>     splice
    -#> 
    -#> Attaching package: 'vctrs'
    -#> The following object is masked from 'package:tibble':
    -#> 
    -#>     data_frame
    -#> The following object is masked from 'package:dplyr':
    -#> 
    -#>     data_frame
    -#> The workflow being saved contains a recipe, which is 0.23 Mb in memory. If this was not intentional, please set the control setting `save_workflow = FALSE`.
    -
    -knn_res
    +
    +# tune k and fit to the 5-fold cv
    +set.seed(2020)
    +knn_res <- 
    +  tune_grid(
    +    knn_wflow,
    +    resamples = folds,
    +    metrics = metric,
    +    grid = 4,
    +    control = ctrl_grid
    +  )
    +
    +knn_res
     #> # Tuning results
     #> # 5-fold cross-validation 
     #> # A tibble: 5 x 5
    -#>   splits           id    .metrics         .notes           .predictions      
    -#>   <list>           <chr> <list>           <list>           <list>            
    -#> 1 <split [343/86]> Fold1 <tibble [4 × 5]> <tibble [0 × 1]> <tibble [344 × 5]>
    -#> 2 <split [343/86]> Fold2 <tibble [4 × 5]> <tibble [0 × 1]> <tibble [344 × 5]>
    -#> 3 <split [343/86]> Fold3 <tibble [4 × 5]> <tibble [0 × 1]> <tibble [344 × 5]>
    -#> 4 <split [343/86]> Fold4 <tibble [4 × 5]> <tibble [0 × 1]> <tibble [344 × 5]>
    -#> 5 <split [344/85]> Fold5 <tibble [4 × 5]> <tibble [0 × 1]> <tibble [340 × 5]>
    +#> splits id .metrics .notes .predictions +#> <list> <chr> <list> <list> <list> +#> 1 <split [343/86… Fold1 <tibble[,5] [4 ×… <tibble[,1] [0 ×… <tibble[,5] [344 × … +#> 2 <split [343/86… Fold2 <tibble[,5] [4 ×… <tibble[,1] [0 ×… <tibble[,5] [344 × … +#> 3 <split [343/86… Fold3 <tibble[,5] [4 ×… <tibble[,1] [0 ×… <tibble[,5] [344 × … +#> 4 <split [343/86… Fold4 <tibble[,5] [4 ×… <tibble[,1] [0 ×… <tibble[,5] [344 × … +#> 5 <split [344/85… Fold5 <tibble[,5] [4 ×… <tibble[,1] [0 ×… <tibble[,5] [340 × …

    This knn_res object fully specifies the candidate members, and is ready to be included in a stacks workflow.

    Now, specifying the linear model, note that we are not optimizing over any hyperparameters. Thus, we use the fit_resamples() function rather than tune_grid() or tune_bayes() when fitting to our resamples.

    -
    -# create a model definition
    -lin_reg_spec <-
    -  linear_reg() %>%
    -  set_engine("lm")
    +
    +# create a model definition
    +lin_reg_spec <-
    +  linear_reg() %>%
    +  set_engine("lm")
     
     # extend the recipe
    -lin_reg_rec <-
    -  tree_frogs_rec %>%
    -  step_dummy(all_nominal()) %>%
    -  step_zv(all_predictors(), skip = TRUE)
    +lin_reg_rec <-
    +  tree_frogs_rec %>%
    +  step_dummy(all_nominal()) %>%
    +  step_zv(all_predictors(), skip = TRUE)
     
     # add both to a workflow
    -lin_reg_wflow <- 
    -  workflow() %>%
    -  add_model(lin_reg_spec) %>%
    -  add_recipe(lin_reg_rec)
    +lin_reg_wflow <- 
    +  workflow() %>%
    +  add_model(lin_reg_spec) %>%
    +  add_recipe(lin_reg_rec)
     
     # fit to the 5-fold cv
    -set.seed(2020)
    -lin_reg_res <- 
    -  fit_resamples(
    -    lin_reg_wflow,
    -    resamples = folds,
    -    metrics = metric,
    -    control = ctrl_res
    -  )
    -#> The workflow being saved contains a recipe, which is 0.22 Mb in memory. If this was not intentional, please set the control setting `save_workflow = FALSE`.
    -
    -lin_reg_res
    +set.seed(2020)
    +lin_reg_res <- 
    +  fit_resamples(
    +    lin_reg_wflow,
    +    resamples = folds,
    +    metrics = metric,
    +    control = ctrl_res
    +  )
    +
    +lin_reg_res
     #> # Resampling results
     #> # 5-fold cross-validation 
     #> # A tibble: 5 x 5
    -#>   splits           id    .metrics         .notes           .predictions     
    -#>   <list>           <chr> <list>           <list>           <list>           
    -#> 1 <split [343/86]> Fold1 <tibble [1 × 4]> <tibble [0 × 1]> <tibble [86 × 4]>
    -#> 2 <split [343/86]> Fold2 <tibble [1 × 4]> <tibble [0 × 1]> <tibble [86 × 4]>
    -#> 3 <split [343/86]> Fold3 <tibble [1 × 4]> <tibble [0 × 1]> <tibble [86 × 4]>
    -#> 4 <split [343/86]> Fold4 <tibble [1 × 4]> <tibble [0 × 1]> <tibble [86 × 4]>
    -#> 5 <split [344/85]> Fold5 <tibble [1 × 4]> <tibble [0 × 1]> <tibble [85 × 4]>
    +#> splits id .metrics .notes .predictions +#> <list> <chr> <list> <list> <list> +#> 1 <split [343/86… Fold1 <tibble[,4] [1 × … <tibble[,1] [0 ×… <tibble[,4] [86 × … +#> 2 <split [343/86… Fold2 <tibble[,4] [1 × … <tibble[,1] [0 ×… <tibble[,4] [86 × … +#> 3 <split [343/86… Fold3 <tibble[,4] [1 × … <tibble[,1] [0 ×… <tibble[,4] [86 × … +#> 4 <split [343/86… Fold4 <tibble[,4] [1 × … <tibble[,1] [0 ×… <tibble[,4] [86 × … +#> 5 <split [344/85… Fold5 <tibble[,4] [1 × … <tibble[,1] [0 ×… <tibble[,4] [85 × …

    Finally, putting together the model definition for the support vector machine:

    -
    -# create a model definition
    -svm_spec <- 
    -  svm_rbf(
    -    cost = tune("cost"), 
    -    rbf_sigma = tune("sigma")
    -  ) %>%
    -  set_engine("kernlab") %>%
    -  set_mode("regression")
    +
    +# create a model definition
    +svm_spec <- 
    +  svm_rbf(
    +    cost = tune("cost"), 
    +    rbf_sigma = tune("sigma")
    +  ) %>%
    +  set_engine("kernlab") %>%
    +  set_mode("regression")
     
     # extend the recipe
    -svm_rec <-
    -  tree_frogs_rec %>%
    -  step_dummy(all_nominal()) %>%
    -  step_zv(all_predictors(), skip = TRUE) %>%
    -  step_meanimpute(all_numeric(), skip = TRUE) %>%
    -  step_corr(all_predictors(), skip = TRUE) %>%
    -  step_normalize(all_numeric(), skip = TRUE)
    +svm_rec <-
    +  tree_frogs_rec %>%
    +  step_dummy(all_nominal()) %>%
    +  step_zv(all_predictors(), skip = TRUE) %>%
    +  step_meanimpute(all_numeric(), skip = TRUE) %>%
    +  step_corr(all_predictors(), skip = TRUE) %>%
    +  step_normalize(all_numeric(), skip = TRUE)
     
     # add both to a workflow
    -svm_wflow <- 
    -  workflow() %>% 
    -  add_model(svm_spec) %>%
    -  add_recipe(svm_rec)
    +svm_wflow <- 
    +  workflow() %>% 
    +  add_model(svm_spec) %>%
    +  add_recipe(svm_rec)
     
     # tune cost and sigma and fit to the 5-fold cv
    -set.seed(2020)
    -svm_res <- 
    -  tune_grid(
    -    svm_wflow, 
    -    resamples = folds, 
    -    grid = 6,
    -    metrics = metric,
    -    control = ctrl_grid
    -  )
    -#> 
    -#> Attaching package: 'kernlab'
    -#> The following object is masked from 'package:scales':
    -#> 
    -#>     alpha
    -#> The following object is masked from 'package:ggplot2':
    -#> 
    -#>     alpha
    -#> The following object is masked from 'package:purrr':
    -#> 
    -#>     cross
    -#> The workflow being saved contains a recipe, which is 0.23 Mb in memory. If this was not intentional, please set the control setting `save_workflow = FALSE`.
    -
    -svm_res
    +set.seed(2020)
    +svm_res <- 
    +  tune_grid(
    +    svm_wflow, 
    +    resamples = folds, 
    +    grid = 6,
    +    metrics = metric,
    +    control = ctrl_grid
    +  )
    +
    +svm_res
     #> # Tuning results
     #> # 5-fold cross-validation 
     #> # A tibble: 5 x 5
    -#>   splits           id    .metrics         .notes           .predictions      
    -#>   <list>           <chr> <list>           <list>           <list>            
    -#> 1 <split [343/86]> Fold1 <tibble [6 × 6]> <tibble [0 × 1]> <tibble [516 × 6]>
    -#> 2 <split [343/86]> Fold2 <tibble [6 × 6]> <tibble [0 × 1]> <tibble [516 × 6]>
    -#> 3 <split [343/86]> Fold3 <tibble [6 × 6]> <tibble [0 × 1]> <tibble [516 × 6]>
    -#> 4 <split [343/86]> Fold4 <tibble [6 × 6]> <tibble [0 × 1]> <tibble [516 × 6]>
    -#> 5 <split [344/85]> Fold5 <tibble [6 × 6]> <tibble [0 × 1]> <tibble [510 × 6]>
    +#> splits id .metrics .notes .predictions +#> <list> <chr> <list> <list> <list> +#> 1 <split [343/86… Fold1 <tibble[,6] [6 ×… <tibble[,1] [0 ×… <tibble[,6] [516 × … +#> 2 <split [343/86… Fold2 <tibble[,6] [6 ×… <tibble[,1] [0 ×… <tibble[,6] [516 × … +#> 3 <split [343/86… Fold3 <tibble[,6] [6 ×… <tibble[,1] [0 ×… <tibble[,6] [516 × … +#> 4 <split [343/86… Fold4 <tibble[,6] [6 ×… <tibble[,1] [0 ×… <tibble[,6] [516 × … +#> 5 <split [344/85… Fold5 <tibble[,6] [6 ×… <tibble[,1] [0 ×… <tibble[,6] [510 × …

    Altogether, we’ve created three model definitions, where the K-nearest neighbors model definition specifies 4 model configurations, the linear regression specifies 1, and the support vector machine specifies 6.

    With these three model definitions fully specified, we are ready to begin stacking these model configurations. (Note that, in most applied settings, one would likely specify many more than 11 candidate members.)

    @@ -407,43 +376,43 @@

    The first step to building an ensemble with stacks is to create a data_stack object—in this package, data stacks are tibbles (with some extra attributes) that contain the assessment set predictions for each candidate ensemble member.

    We can initialize a data stack using the stacks() function.

    -
    -stacks()
    -#> # A data stack with 0 model definitions and 0 candidate members.
    +
    +stacks()
    +#> # A data stack with 0 model definitions and 0 candidate members.

    The stacks() function works sort of like the ggplot() constructor from ggplot2—the function creates a basic structure that the object will be built on top of—except you’ll pipe the outputs rather than adding them with +.

    The add_candidates() function adds ensemble members to the stack.

    -
    -tree_frogs_data_st <- 
    -  stacks() %>%
    -  add_candidates(knn_res) %>%
    -  add_candidates(lin_reg_res) %>%
    -  add_candidates(svm_res)
    -
    -tree_frogs_data_st
    +
    +tree_frogs_data_st <- 
    +  stacks() %>%
    +  add_candidates(knn_res) %>%
    +  add_candidates(lin_reg_res) %>%
    +  add_candidates(svm_res)
    +
    +tree_frogs_data_st
     #> # A data stack with 3 model definitions and 11 candidate members:
     #> #   knn_res: 4 model configurations
     #> #   lin_reg_res: 1 model configuration
     #> #   svm_res: 6 model configurations
    -#> # Outcome: latency (numeric)
    +#> # Outcome: latency (numeric)

    As mentioned before, under the hood, a data_stack object is really just a tibble with some extra attributes. Checking out the actual data:

    -
    -as_tibble(tree_frogs_data_st)
    +
    +as_tibble(tree_frogs_data_st)
     #> # A tibble: 429 x 12
     #>    latency knn_res_1_1 knn_res_1_2 knn_res_1_3 knn_res_1_4 lin_reg_res_1_1
     #>      <dbl>       <dbl>       <dbl>       <dbl>       <dbl>           <dbl>
    -#>  1     360      -0.343      -0.427      -0.469      -0.478           194. 
    -#>  2     106      -0.336      -0.423      -0.438      -0.444           123. 
    -#>  3     180      -0.343      -0.427      -0.469      -0.478           138. 
    -#>  4      60      -0.350      -0.385      -0.401      -0.407           122. 
    -#>  5      39      -0.251      -0.357      -0.427      -0.441            82.9
    -#>  6     214      -0.420      -0.455      -0.505      -0.515           134. 
    -#>  7      50      -0.336      -0.423      -0.438      -0.444            37.2
    -#>  8     224      -0.336      -0.423      -0.438      -0.444           125. 
    -#>  9      63      -0.420      -0.455      -0.505      -0.515            40.3
    -#> 10      33      -0.336      -0.423      -0.438      -0.444            38.3
    -#> # … with 419 more rows, and 6 more variables: svm_res_1_6 <dbl>,
    -#> #   svm_res_1_5 <dbl>, svm_res_1_3 <dbl>, svm_res_1_1 <dbl>, svm_res_1_2 <dbl>,
    -#> #   svm_res_1_4 <dbl>
    +#> 1 142 -0.496 -0.478 -0.492 -0.494 114. +#> 2 79 -0.381 -0.446 -0.542 -0.553 78.6 +#> 3 50 -0.311 -0.352 -0.431 -0.438 81.5 +#> 4 68 -0.312 -0.368 -0.463 -0.473 78.6 +#> 5 64 -0.496 -0.478 -0.492 -0.494 36.5 +#> 6 52 -0.391 -0.412 -0.473 -0.482 124. +#> 7 39 -0.523 -0.549 -0.581 -0.587 35.2 +#> 8 46 -0.523 -0.549 -0.581 -0.587 37.1 +#> 9 137 -0.287 -0.352 -0.447 -0.456 78.8 +#> 10 73 -0.523 -0.549 -0.581 -0.587 38.8 +#> # … with 419 more rows, and 6 more variables: svm_res_1_5 <dbl>, +#> # svm_res_1_6 <dbl>, svm_res_1_1 <dbl>, svm_res_1_4 <dbl>, svm_res_1_3 <dbl>, +#> # svm_res_1_2 <dbl>

    The first column gives the first response value, and the remaining columns give the assessment set predictions for each ensemble member. Since we’re in the regression case, there’s only one column per ensemble member. In classification settings, there are as many columns as there are levels of the outcome variable per candidate ensemble member.

    That’s it! We’re now ready to evaluate how it is that we need to combine predictions from each candidate ensemble member.

    @@ -451,85 +420,78 @@

    Fit the stack

    The outputs from each of these candidate ensemble members are highly correlated, so the blend_predictions() function performs regularization to figure out how we can combine the outputs from the stack members to come up with a final prediction.

    -
    -tree_frogs_model_st <-
    -  tree_frogs_data_st %>%
    -  blend_predictions()
    -#> Loading required package: Matrix
    -#> 
    -#> Attaching package: 'Matrix'
    -#> The following objects are masked from 'package:tidyr':
    -#> 
    -#>     expand, pack, unpack
    -#> Loaded glmnet 4.0-2
    +
    +tree_frogs_model_st <-
    +  tree_frogs_data_st %>%
    +  blend_predictions()

    The blend_predictions function determines how member model output will ultimately be combined in the final prediction by fitting a LASSO model on the data stack, predicting the true assessment set outcome using the predictions from each of the candidate members. Candidates with nonzero stacking coefficients become members.

    To make sure that we have the right trade-off between minimizing the number of members and optimizing performance, we can use the autoplot() method:

    -
    -theme_set(theme_bw())
    -autoplot(tree_frogs_model_st)
    +
    +theme_set(theme_bw())
    +autoplot(tree_frogs_model_st)

    To show the relationship more directly:

    -
    -autoplot(tree_frogs_model_st, type = "members")
    +
    +autoplot(tree_frogs_model_st, type = "members")

    If these results were not good enough, blend_predictions() could be called again with different values of penalty. As it is, blend_predictions() picks the penalty parameter with the numerically optimal results. To see the top results:

    -
    -autoplot(tree_frogs_model_st, type = "weights")
    +
    +autoplot(tree_frogs_model_st, type = "weights")

    Now that we know how to combine our model output, we can fit the candidates with non-zero stacking coefficients on the full training set.

    -
    -tree_frogs_model_st <-
    -  tree_frogs_model_st %>%
    -  fit_members()
    +
    +tree_frogs_model_st <-
    +  tree_frogs_model_st %>%
    +  fit_members()

    Model stacks can be thought of as a group of fitted member models and a set of instructions on how to combine their predictions.

    To identify which model configurations were assigned what stacking coefficients, we can make use of the collect_parameters() function:

    -
    -collect_parameters(tree_frogs_model_st, "svm_res")
    +
    +collect_parameters(tree_frogs_model_st, "svm_res")
     #> # A tibble: 6 x 4
    -#>   member          cost    sigma  coef
    -#>   <chr>          <dbl>    <dbl> <dbl>
    -#> 1 svm_res_1_1  0.510   4.24e- 4  326.
    -#> 2 svm_res_1_2  4.57    1.29e- 3  160.
    -#> 3 svm_res_1_3  0.0378  8.18e- 1    0 
    -#> 4 svm_res_1_4 14.1     8.16e- 7    0 
    -#> 5 svm_res_1_5  0.0126  1.25e- 8    0 
    -#> 6 svm_res_1_6  0.00154 1.63e-10    0
    +#> member cost sigma coef +#> <chr> <dbl> <dbl> <dbl> +#> 1 svm_res_1_1 0.153 0.0196 13.9 +#> 2 svm_res_1_2 5.76 0.00000856 516. +#> 3 svm_res_1_3 1.72 0.0000239 0 +#> 4 svm_res_1_4 0.192 0.0000000552 0 +#> 5 svm_res_1_5 0.00315 0.00000000359 0 +#> 6 svm_res_1_6 0.00733 0.0326 12.5

    This object is now ready to predict with new data!

    -
    -tree_frogs_test <- 
    -  tree_frogs_test %>%
    -  bind_cols(predict(tree_frogs_model_st, .))
    +
    +tree_frogs_test <- 
    +  tree_frogs_test %>%
    +  bind_cols(predict(tree_frogs_model_st, .))

    Juxtaposing the predictions with the true data:

    -
    -ggplot(tree_frogs_test) +
    -  aes(x = latency, 
    -      y = .pred) +
    -  geom_point() + 
    -  coord_obs_pred()
    +
    +ggplot(tree_frogs_test) +
    +  aes(x = latency, 
    +      y = .pred) +
    +  geom_point() + 
    +  coord_obs_pred()

    Looks like our predictions were pretty strong! How do the stacks predictions perform, though, as compared to the members’ predictions? We can use the type = "members" argument to generate predictions from each of the ensemble members.

    -
    -member_preds <- 
    -  tree_frogs_test %>%
    -  select(latency) %>%
    -  bind_cols(predict(tree_frogs_model_st, tree_frogs_test, members = TRUE))
    +
    +member_preds <- 
    +  tree_frogs_test %>%
    +  select(latency) %>%
    +  bind_cols(predict(tree_frogs_model_st, tree_frogs_test, members = TRUE))

    Now, evaluating the root mean squared error from each model:

    -
    -map_dfr(member_preds, rmse, truth = latency, data = member_preds) %>%
    -  mutate(member = colnames(member_preds))
    +
    +map_dfr(member_preds, rmse, truth = latency, data = member_preds) %>%
    +  mutate(member = colnames(member_preds))
     #> # A tibble: 7 x 4
     #>   .metric .estimator .estimate member         
     #>   <chr>   <chr>          <dbl> <chr>          
     #> 1 rmse    standard         0   latency        
    -#> 2 rmse    standard        44.5 .pred          
    -#> 3 rmse    standard       104.  knn_res_1_1    
    -#> 4 rmse    standard       104.  knn_res_1_2    
    -#> 5 rmse    standard        44.7 lin_reg_res_1_1
    -#> 6 rmse    standard       104.  svm_res_1_1    
    -#> 7 rmse    standard       104.  svm_res_1_2
    +#> 2 rmse standard 55.5 .pred +#> 3 rmse standard 114. knn_res_1_4 +#> 4 rmse standard 55.5 lin_reg_res_1_1 +#> 5 rmse standard 114. svm_res_1_6 +#> 6 rmse standard 114. svm_res_1_1 +#> 7 rmse standard 114. svm_res_1_2

    As we can see, the stacked ensemble outperforms each of the member models, though is closely followed by one of its members.

    Voila! You’ve now made use of the stacks package to predict red-eyed tree frog embryo hatching using a stacked ensemble! The full visual outline for these steps can be found here.

    diff --git a/docs/articles/basics_files/figure-html/members-plot-1.png b/docs/articles/basics_files/figure-html/members-plot-1.png index a2ce7d67..b2aabd36 100644 Binary files a/docs/articles/basics_files/figure-html/members-plot-1.png and b/docs/articles/basics_files/figure-html/members-plot-1.png differ diff --git a/docs/articles/basics_files/figure-html/penalty-plot-1.png b/docs/articles/basics_files/figure-html/penalty-plot-1.png index c8f6de63..c087b798 100644 Binary files a/docs/articles/basics_files/figure-html/penalty-plot-1.png and b/docs/articles/basics_files/figure-html/penalty-plot-1.png differ diff --git a/docs/articles/basics_files/figure-html/unnamed-chunk-25-1.png b/docs/articles/basics_files/figure-html/unnamed-chunk-25-1.png index f1608224..a3315f35 100644 Binary files a/docs/articles/basics_files/figure-html/unnamed-chunk-25-1.png and b/docs/articles/basics_files/figure-html/unnamed-chunk-25-1.png differ diff --git a/docs/articles/basics_files/figure-html/unnamed-chunk-3-1.png b/docs/articles/basics_files/figure-html/unnamed-chunk-3-1.png index f90be06f..1db347e6 100644 Binary files a/docs/articles/basics_files/figure-html/unnamed-chunk-3-1.png and b/docs/articles/basics_files/figure-html/unnamed-chunk-3-1.png differ diff --git a/docs/articles/basics_files/figure-html/weight-plot-1.png b/docs/articles/basics_files/figure-html/weight-plot-1.png index f1a8e005..2150fe97 100644 Binary files a/docs/articles/basics_files/figure-html/weight-plot-1.png and b/docs/articles/basics_files/figure-html/weight-plot-1.png differ diff --git a/docs/articles/basics_files/header-attrs-2.6/header-attrs.js b/docs/articles/basics_files/header-attrs-2.6/header-attrs.js new file mode 100644 index 00000000..dd57d92e --- /dev/null +++ b/docs/articles/basics_files/header-attrs-2.6/header-attrs.js @@ -0,0 +1,12 @@ +// Pandoc 2.9 adds attributes on both header and div. We remove the former (to +// be compatible with the behavior of Pandoc < 2.8). +document.addEventListener('DOMContentLoaded', function(e) { + var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); + var i, h, a; + for (i = 0; i < hs.length; i++) { + h = hs[i]; + if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 + a = h.attributes; + while (a.length > 0) h.removeAttribute(a[0].name); + } +}); diff --git a/docs/articles/classification.html b/docs/articles/classification.html index 61a174a3..a0f02286 100644 --- a/docs/articles/classification.html +++ b/docs/articles/classification.html @@ -50,7 +50,7 @@ stacks
    @@ -58,7 +58,7 @@ @@ -94,14 +99,13 @@ - -
    +
    @@ -109,244 +113,240 @@

    Classification Models With stacks

    In this vignette, we’ll tackle a multiclass classification problem using the stacks package. This vignette assumes that you’re familiar with tidymodels “proper,” as well as the basic grammar of the package, and have seen it implemented on numeric data; if this is not the case, check out the “Getting Started With stacks” vignette!

    - +

    In this example, we’ll make use of the tree_frogs data exported with stacks, giving experimental results on hatching behavior of red-eyed tree frog embryos!

    -

    Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to “predator stimulus” by jiggling the embryos with a blunt probe. Beforehand, though some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ.) Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!

    -

    In this article, we’ll use most all of the variables in tree_frogs to predict reflex, a measure of ear function called the vestibulo-ocular reflex, categorized into bins. Ear function increases from factor levels “low”, to “mid”, to “full”.

    -
    -data("tree_frogs")
    +

    Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to “predator stimulus” by jiggling the embryos with a blunt probe. Beforehand, though, some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ). Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!

    +

    In this article, we’ll use most all of the variables in tree_frogs to predict reflex, a measure of ear function called the vestibulo-ocular reflex (VOR), categorized into bins. Ear function increases from factor levels “low”, to “mid”, to “full”.

    +
    +data("tree_frogs")
     
     # subset the data
    -tree_frogs <- tree_frogs %>%
    -  select(-c(clutch, latency))
    +tree_frogs <- tree_frogs %>% + select(-c(clutch, latency))

    Let’s plot the data to get a sense for how separable these groups are.

    -
    -library(ggplot2)
    +
    +library(ggplot2)
     
    -ggplot(tree_frogs) +
    -  aes(x = treatment, y = age, color = reflex) +
    -  geom_jitter() +
    -  labs(y = "Embryo Age (s)", 
    -       x = "treatment",
    -       color = "Response")
    +ggplot(tree_frogs) + + aes(x = treatment, y = age, color = reflex) + + geom_jitter() + + labs(y = "Embryo Age (s)", + x = "treatment", + color = "Response")

    It looks like the embryo age is pretty effective at picking out embryos with full VOR function, but the problem gets tougher for the less developed embryos! Let’s see how well the stacked ensemble can classify these tree frogs.

    Defining candidate ensemble members

    As in the numeric prediction setting, defining the candidate ensemble members is undoubtedly the longest part of the ensembling process with stacks. First, splitting up the training data, generating resamples, and setting some options that will be used by each model definition.

    -
    -# some setup: resampling and a basic recipe
    -set.seed(1)
    +
    +# some setup: resampling and a basic recipe
    +set.seed(1)
     
    -tree_frogs_split <- initial_split(tree_frogs)
    -tree_frogs_train <- training(tree_frogs_split)
    -tree_frogs_test  <- testing(tree_frogs_split)
    +tree_frogs_split <- initial_split(tree_frogs)
    +tree_frogs_train <- training(tree_frogs_split)
    +tree_frogs_test  <- testing(tree_frogs_split)
     
    -folds <- rsample::vfold_cv(tree_frogs_train, v = 5)
    +folds <- rsample::vfold_cv(tree_frogs_train, v = 5)
     
    -tree_frogs_rec <- 
    -  recipe(reflex ~ ., data = tree_frogs_train) %>%
    -  step_dummy(all_nominal(), -reflex) %>%
    -  step_zv(all_predictors())
    +tree_frogs_rec <- 
    +  recipe(reflex ~ ., data = tree_frogs_train) %>%
    +  step_dummy(all_nominal(), -reflex) %>%
    +  step_zv(all_predictors())
     
    -tree_frogs_wflow <- 
    -  workflow() %>% 
    -  add_recipe(tree_frogs_rec)
    +tree_frogs_wflow <- + workflow() %>% + add_recipe(tree_frogs_rec)

    We also need to use the same control settings as in the numeric response setting:

    -
    -ctrl_grid <- control_stack_grid()
    +
    +ctrl_grid <- control_stack_grid()

    We’ll define two different model definitions to try to predict reflex—a random forest and a neural network.

    Starting out with a random forest:

    -
    -rand_forest_spec <- 
    -  rand_forest(
    -    mtry = tune(),
    -    min_n = tune(),
    -    trees = 500
    -  ) %>%
    -  set_mode("classification") %>%
    -  set_engine("ranger")
    +
    +rand_forest_spec <- 
    +  rand_forest(
    +    mtry = tune(),
    +    min_n = tune(),
    +    trees = 500
    +  ) %>%
    +  set_mode("classification") %>%
    +  set_engine("ranger")
     
    -rand_forest_wflow <-
    -  tree_frogs_wflow %>%
    -  add_model(rand_forest_spec)
    +rand_forest_wflow <-
    +  tree_frogs_wflow %>%
    +  add_model(rand_forest_spec)
     
    -rand_forest_res <- 
    -  tune_grid(
    -    object = rand_forest_wflow, 
    -    resamples = folds, 
    -    grid = 10,
    -    control = ctrl_grid
    -  )
    +rand_forest_res <- + tune_grid( + object = rand_forest_wflow, + resamples = folds, + grid = 10, + control = ctrl_grid + )

    Now, moving on to the neural network model definition:

    -
    -nnet_spec <-
    -  mlp(hidden_units = tune(), penalty = tune(), epochs = tune()) %>%
    -  set_mode("classification") %>%
    -  set_engine("nnet")
    +
    +nnet_spec <-
    +  mlp(hidden_units = tune(), penalty = tune(), epochs = tune()) %>%
    +  set_mode("classification") %>%
    +  set_engine("nnet")
     
    -nnet_rec <- 
    -  tree_frogs_rec %>% 
    -  step_normalize(all_predictors())
    +nnet_rec <- 
    +  tree_frogs_rec %>% 
    +  step_normalize(all_predictors())
     
    -nnet_wflow <- 
    -  tree_frogs_wflow %>%
    -  add_model(nnet_spec)
    +nnet_wflow <- 
    +  tree_frogs_wflow %>%
    +  add_model(nnet_spec)
     
    -nnet_res <-
    -  tune_grid(
    -    object = nnet_wflow, 
    -    resamples = folds, 
    -    grid = 10,
    -    control = ctrl_grid
    -  )
    +nnet_res <- + tune_grid( + object = nnet_wflow, + resamples = folds, + grid = 10, + control = ctrl_grid + )

    With these model definitions fully specified, we’re ready to start putting together an ensemble!

    Putting together a stack

    Building the stacked ensemble, now, only takes a few lines:

    -
    -tree_frogs_model_st <- 
    +
    +tree_frogs_model_st <- 
       # initialize the stack
    -  stacks() %>%
    +  stacks() %>%
       # add candidate members
    -  add_candidates(rand_forest_res) %>%
    -  add_candidates(nnet_res) %>%
    +  add_candidates(rand_forest_res) %>%
    +  add_candidates(nnet_res) %>%
       # determine how to combine their predictions
    -  blend_predictions() %>%
    +  blend_predictions() %>%
       # fit the candidates with nonzero stacking coefficients
    -  fit_members()
    +  fit_members()
     
    -tree_frogs_model_st
    +tree_frogs_model_st
     #> # A tibble: 10 x 4
    -#>    member                          type        weight class
    -#>    <chr>                           <chr>        <dbl> <chr>
    -#>  1 .pred_full_nnet_res_1_09        mlp          77.7  low  
    -#>  2 .pred_mid_rand_forest_res_1_03  rand_forest  31.6  low  
    -#>  3 .pred_mid_rand_forest_res_1_04  rand_forest  21.2  mid  
    -#>  4 .pred_mid_rand_forest_res_1_02  rand_forest  16.3  mid  
    -#>  5 .pred_mid_rand_forest_res_1_06  rand_forest  12.8  low  
    -#>  6 .pred_full_rand_forest_res_1_06 rand_forest  11.5  full 
    -#>  7 .pred_mid_rand_forest_res_1_09  rand_forest  11.1  low  
    -#>  8 .pred_mid_rand_forest_res_1_08  rand_forest  10.4  mid  
    -#>  9 .pred_mid_nnet_res_1_03         mlp          10.0  mid  
    -#> 10 .pred_mid_rand_forest_res_1_01  rand_forest   9.20 mid
    +#> member type weight class +#> <chr> <chr> <dbl> <chr> +#> 1 .pred_full_nnet_res_1_04 mlp 62.3 low +#> 2 .pred_mid_rand_forest_res_1_01 rand_forest 44.0 low +#> 3 .pred_mid_nnet_res_1_04 mlp 28.2 low +#> 4 .pred_full_nnet_res_1_07 mlp 22.7 low +#> 5 .pred_mid_rand_forest_res_1_10 rand_forest 22.1 mid +#> 6 .pred_mid_rand_forest_res_1_05 rand_forest 21.8 mid +#> 7 .pred_mid_nnet_res_1_08 mlp 21.1 low +#> 8 .pred_mid_rand_forest_res_1_07 rand_forest 20.1 low +#> 9 .pred_mid_nnet_res_1_09 mlp 13.1 mid +#> 10 .pred_mid_rand_forest_res_1_09 rand_forest 9.96 mid

    To make sure that we have the right trade-off between minimizing the number of members and optimizing performance, we can use the autoplot() method:

    -
    -theme_set(theme_bw())
    -autoplot(tree_frogs_model_st)
    +
    +theme_set(theme_bw())
    +autoplot(tree_frogs_model_st)

    To show the relationship more directly:

    -
    -autoplot(tree_frogs_model_st, type = "members")
    +
    +autoplot(tree_frogs_model_st, type = "members")

    If these results were not good enough, blend_predictions() could be called again with different values of penalty. As it is, blend_predictions() picks the penalty parameter with the numerically optimal results. To see the top results:

    -
    -autoplot(tree_frogs_model_st, type = "weights")
    +
    +autoplot(tree_frogs_model_st, type = "weights")

    There are multiple facets since the ensemble members can have different effects on different classes.

    To identify which model configurations were assigned what stacking coefficients, we can make use of the collect_parameters() function:

    -
    -collect_parameters(tree_frogs_model_st, "rand_forest_res")
    +
    +collect_parameters(tree_frogs_model_st, "rand_forest_res")
     #> # A tibble: 60 x 6
    -#>    member                mtry min_n class terms                            coef
    -#>    <chr>                <int> <int> <chr> <chr>                           <dbl>
    -#>  1 rand_forest_res_1_01     2    33 low   .pred_mid_rand_forest_res_1_01   0   
    -#>  2 rand_forest_res_1_01     2    33 low   .pred_full_rand_forest_res_1_01  0   
    -#>  3 rand_forest_res_1_01     2    33 mid   .pred_mid_rand_forest_res_1_01   9.20
    -#>  4 rand_forest_res_1_01     2    33 mid   .pred_full_rand_forest_res_1_01  0   
    -#>  5 rand_forest_res_1_01     2    33 full  .pred_mid_rand_forest_res_1_01   0   
    -#>  6 rand_forest_res_1_01     2    33 full  .pred_full_rand_forest_res_1_01  0   
    -#>  7 rand_forest_res_1_02     5     6 low   .pred_mid_rand_forest_res_1_02   0   
    -#>  8 rand_forest_res_1_02     5     6 low   .pred_full_rand_forest_res_1_02  0   
    -#>  9 rand_forest_res_1_02     5     6 mid   .pred_mid_rand_forest_res_1_02  16.3 
    -#> 10 rand_forest_res_1_02     5     6 mid   .pred_full_rand_forest_res_1_02  0   
    -#> # … with 50 more rows
    +#> member mtry min_n class terms coef +#> <chr> <int> <int> <chr> <chr> <dbl> +#> 1 rand_forest_res_1_01 1 26 low .pred_mid_rand_forest_res_1_01 44.0 +#> 2 rand_forest_res_1_01 1 26 low .pred_full_rand_forest_res_1_01 0 +#> 3 rand_forest_res_1_01 1 26 mid .pred_mid_rand_forest_res_1_01 0 +#> 4 rand_forest_res_1_01 1 26 mid .pred_full_rand_forest_res_1_01 0 +#> 5 rand_forest_res_1_01 1 26 full .pred_mid_rand_forest_res_1_01 0 +#> 6 rand_forest_res_1_01 1 26 full .pred_full_rand_forest_res_1_01 0 +#> 7 rand_forest_res_1_02 2 33 low .pred_mid_rand_forest_res_1_02 0 +#> 8 rand_forest_res_1_02 2 33 low .pred_full_rand_forest_res_1_02 0 +#> 9 rand_forest_res_1_02 2 33 mid .pred_mid_rand_forest_res_1_02 6.34 +#> 10 rand_forest_res_1_02 2 33 mid .pred_full_rand_forest_res_1_02 0.329 +#> # … with 50 more rows

    This object is now ready to predict with new data!

    -
    -tree_frogs_pred <-
    -  tree_frogs_test %>%
    -  bind_cols(predict(tree_frogs_model_st, ., type = "prob"))
    +
    +tree_frogs_pred <-
    +  tree_frogs_test %>%
    +  bind_cols(predict(tree_frogs_model_st, ., type = "prob"))

    Computing the ROC AUC for the model:

    -
    -yardstick::roc_auc(
    -  tree_frogs_pred,
    -  truth = reflex,
    -  contains(".pred_")
    -  )
    +
    +yardstick::roc_auc(
    +  tree_frogs_pred,
    +  truth = reflex,
    +  contains(".pred_")
    +  )

    Looks like our predictions were pretty strong! How do the stacks predictions perform, though, as compared to the members’ predictions? We can use the members argument to generate predictions from each of the ensemble members.

    -
    -tree_frogs_pred <-
    -  tree_frogs_test %>%
    -  select(reflex) %>%
    -  bind_cols(
    -    predict(
    -      tree_frogs_model_st,
    -      tree_frogs_test,
    -      type = "class",
    -      members = TRUE
    -      )
    -    )
    +
    +tree_frogs_pred <-
    +  tree_frogs_test %>%
    +  select(reflex) %>%
    +  bind_cols(
    +    predict(
    +      tree_frogs_model_st,
    +      tree_frogs_test,
    +      type = "class",
    +      members = TRUE
    +      )
    +    )
     
    -tree_frogs_pred
    -#> # A tibble: 303 x 20
    -#>    reflex .pred_class .pred_class_ran… .pred_class_ran… .pred_class_ran…
    -#>    <fct>  <fct>       <fct>            <fct>            <fct>           
    -#>  1 full   low         full             full             full            
    -#>  2 low    mid         low              low              low             
    -#>  3 full   low         full             full             full            
    -#>  4 low    mid         low              low              low             
    -#>  5 full   low         full             full             full            
    -#>  6 full   low         full             full             full            
    -#>  7 mid    mid         low              low              low             
    -#>  8 mid    full        low              mid              mid             
    -#>  9 low    full        mid              mid              mid             
    -#> 10 full   low         full             full             full            
    -#> # … with 293 more rows, and 15 more variables:
    +tree_frogs_pred
    +#> # A tibble: 303 x 18
    +#>    reflex .pred_class .pred_class_rand_f… .pred_class_rand_f… .pred_class_rand_…
    +#>    <fct>  <fct>       <fct>               <fct>               <fct>             
    +#>  1 full   low         full                full                full              
    +#>  2 mid    full        low                 mid                 mid               
    +#>  3 mid    full        mid                 mid                 mid               
    +#>  4 mid    full        low                 low                 low               
    +#>  5 full   low         full                full                full              
    +#>  6 full   low         full                full                full              
    +#>  7 full   low         full                full                full              
    +#>  8 full   low         full                full                full              
    +#>  9 full   low         full                full                full              
    +#> 10 full   low         full                full                full              
    +#> # … with 293 more rows, and 13 more variables:
    +#> #   .pred_class_rand_forest_res_1_05 <fct>, .pred_class_nnet_res_1_04 <fct>,
    +#> #   .pred_class_nnet_res_1_08 <fct>, .pred_class_nnet_res_1_03 <fct>,
    +#> #   .pred_class_nnet_res_1_10 <fct>, .pred_class_nnet_res_1_07 <fct>,
    +#> #   .pred_class_nnet_res_1_05 <fct>, .pred_class_rand_forest_res_1_10 <fct>,
    +#> #   .pred_class_rand_forest_res_1_02 <fct>,
    +#> #   .pred_class_rand_forest_res_1_09 <fct>,
     #> #   .pred_class_rand_forest_res_1_06 <fct>,
    -#> #   .pred_class_rand_forest_res_1_10 <fct>,
    -#> #   .pred_class_rand_forest_res_1_07 <fct>, .pred_class_nnet_res_1_08 <fct>,
    -#> #   .pred_class_nnet_res_1_07 <fct>, .pred_class_nnet_res_1_09 <fct>,
    -#> #   .pred_class_nnet_res_1_10 <fct>, .pred_class_nnet_res_1_06 <fct>,
    -#> #   .pred_class_nnet_res_1_02 <fct>, .pred_class_nnet_res_1_01 <fct>,
    -#> #   .pred_class_rand_forest_res_1_04 <fct>,
    -#> #   .pred_class_rand_forest_res_1_01 <fct>,
    -#> #   .pred_class_rand_forest_res_1_08 <fct>,
    -#> #   .pred_class_rand_forest_res_1_02 <fct>, .pred_class_nnet_res_1_03 <fct>
    +#> #   .pred_class_rand_forest_res_1_08 <fct>, .pred_class_nnet_res_1_09 <fct>
     
    -map_dfr(
    -  setNames(colnames(tree_frogs_pred), colnames(tree_frogs_pred)),
    -  ~mean(tree_frogs_pred$reflex == pull(tree_frogs_pred, .x))
    -) %>%
    -  pivot_longer(c(everything(), -reflex))
    -#> # A tibble: 19 x 3
    -#>    reflex name                              value
    -#>     <dbl> <chr>                             <dbl>
    -#>  1      1 .pred_class                      0.0528
    -#>  2      1 .pred_class_rand_forest_res_1_03 0.815 
    -#>  3      1 .pred_class_rand_forest_res_1_09 0.848 
    -#>  4      1 .pred_class_rand_forest_res_1_05 0.842 
    -#>  5      1 .pred_class_rand_forest_res_1_06 0.868 
    -#>  6      1 .pred_class_rand_forest_res_1_10 0.851 
    -#>  7      1 .pred_class_rand_forest_res_1_07 0.845 
    -#>  8      1 .pred_class_nnet_res_1_08        0.518 
    -#>  9      1 .pred_class_nnet_res_1_07        0.805 
    -#> 10      1 .pred_class_nnet_res_1_09        0.518 
    -#> 11      1 .pred_class_nnet_res_1_10        0.518 
    -#> 12      1 .pred_class_nnet_res_1_06        0.518 
    -#> 13      1 .pred_class_nnet_res_1_02        0.518 
    -#> 14      1 .pred_class_nnet_res_1_01        0.518 
    -#> 15      1 .pred_class_rand_forest_res_1_04 0.838 
    -#> 16      1 .pred_class_rand_forest_res_1_01 0.835 
    -#> 17      1 .pred_class_rand_forest_res_1_08 0.832 
    -#> 18      1 .pred_class_rand_forest_res_1_02 0.875 
    -#> 19      1 .pred_class_nnet_res_1_03        0.809
    -

    Voila! You’ve now made use of the stacks package to predict tree frog embryo ear function using a stacked ensemble!

    +map_dfr( + setNames(colnames(tree_frogs_pred), colnames(tree_frogs_pred)), + ~mean(tree_frogs_pred$reflex == pull(tree_frogs_pred, .x)) +) %>% + pivot_longer(c(everything(), -reflex)) +#> # A tibble: 17 x 3 +#> reflex name value +#> <dbl> <chr> <dbl> +#> 1 1 .pred_class 0 +#> 2 1 .pred_class_rand_forest_res_1_01 0.845 +#> 3 1 .pred_class_rand_forest_res_1_04 0.871 +#> 4 1 .pred_class_rand_forest_res_1_07 0.865 +#> 5 1 .pred_class_rand_forest_res_1_05 0.845 +#> 6 1 .pred_class_nnet_res_1_04 0.558 +#> 7 1 .pred_class_nnet_res_1_08 0.558 +#> 8 1 .pred_class_nnet_res_1_03 0.558 +#> 9 1 .pred_class_nnet_res_1_10 0.558 +#> 10 1 .pred_class_nnet_res_1_07 0.838 +#> 11 1 .pred_class_nnet_res_1_05 0.558 +#> 12 1 .pred_class_rand_forest_res_1_10 0.871 +#> 13 1 .pred_class_rand_forest_res_1_02 0.875 +#> 14 1 .pred_class_rand_forest_res_1_09 0.881 +#> 15 1 .pred_class_rand_forest_res_1_06 0.871 +#> 16 1 .pred_class_rand_forest_res_1_08 0.881 +#> 17 1 .pred_class_nnet_res_1_09 0.845
    +

    Voilà! You’ve now made use of the stacks package to predict tree frog embryo ear function using a stacked ensemble!

    diff --git a/docs/articles/classification_files/figure-html/members-plot-1.png b/docs/articles/classification_files/figure-html/members-plot-1.png index e1c561f9..f08e4110 100644 Binary files a/docs/articles/classification_files/figure-html/members-plot-1.png and b/docs/articles/classification_files/figure-html/members-plot-1.png differ diff --git a/docs/articles/classification_files/figure-html/penalty-plot-1.png b/docs/articles/classification_files/figure-html/penalty-plot-1.png index 057b2247..3c69d91f 100644 Binary files a/docs/articles/classification_files/figure-html/penalty-plot-1.png and b/docs/articles/classification_files/figure-html/penalty-plot-1.png differ diff --git a/docs/articles/classification_files/figure-html/unnamed-chunk-3-1.png b/docs/articles/classification_files/figure-html/unnamed-chunk-3-1.png index 82fd720b..21c6b747 100644 Binary files a/docs/articles/classification_files/figure-html/unnamed-chunk-3-1.png and b/docs/articles/classification_files/figure-html/unnamed-chunk-3-1.png differ diff --git a/docs/articles/classification_files/figure-html/weight-plot-1.png b/docs/articles/classification_files/figure-html/weight-plot-1.png index f7456835..2a4d648b 100644 Binary files a/docs/articles/classification_files/figure-html/weight-plot-1.png and b/docs/articles/classification_files/figure-html/weight-plot-1.png differ diff --git a/docs/articles/classification_files/header-attrs-2.6/header-attrs.js b/docs/articles/classification_files/header-attrs-2.6/header-attrs.js new file mode 100644 index 00000000..dd57d92e --- /dev/null +++ b/docs/articles/classification_files/header-attrs-2.6/header-attrs.js @@ -0,0 +1,12 @@ +// Pandoc 2.9 adds attributes on both header and div. We remove the former (to +// be compatible with the behavior of Pandoc < 2.8). +document.addEventListener('DOMContentLoaded', function(e) { + var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); + var i, h, a; + for (i = 0; i < hs.length; i++) { + h = hs[i]; + if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 + a = h.attributes; + while (a.length > 0) h.removeAttribute(a[0].name); + } +}); diff --git a/docs/articles/index.html b/docs/articles/index.html index e86adef5..2ac22925 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -94,7 +94,7 @@ stacks
    @@ -102,7 +102,7 @@ diff --git a/docs/authors.html b/docs/authors.html index 3a50ce2a..8254865e 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -94,7 +94,7 @@ stacks @@ -102,7 +102,7 @@ diff --git a/docs/index.html b/docs/index.html index 6e8faa42..bbe15203 100644 --- a/docs/index.html +++ b/docs/index.html @@ -54,7 +54,7 @@ stacks @@ -62,7 +62,7 @@ @@ -123,11 +128,18 @@

    You can install the package with the following code:

    -
    -install.packages("stacks")
    -

    Install the (unstable) development version with:

    -
    -remotes::install_github("tidymodels/stacks", ref = "main")
    +
    +install.packages("stacks")
    +

    Install the development version with:

    +
    +remotes::install_github("tidymodels/stacks", ref = "main")
    +

    stacks is generalized with respect to:

    + +

    stacks uses a regularized linear model to combine predictions from ensemble members, though this model type is only one of many possible learning algorithms that could be used to fit a stacked ensemble model. For implementations of additional ensemble learning algorithms, check out h2o and SuperLearner.

    Rather than diving right into the implementation, we’ll focus here on how the pieces fit together, conceptually, in building an ensemble with stacks. See the basics vignette for an example of the API in action!

    diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 1b420f2d..cd3be80d 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -1,10 +1,10 @@ -pandoc: 2.7.3 +pandoc: 2.11.2 pkgdown: 1.6.1 pkgdown_sha: ~ articles: basics: basics.html classification: classification.html -last_built: 2020-11-30T17:23Z +last_built: 2021-04-19T16:21Z urls: reference: https://stacks.tidymodels.org/reference article: https://stacks.tidymodels.org/articles diff --git a/docs/reference/add_candidates.html b/docs/reference/add_candidates.html index 78125f37..58521609 100644 --- a/docs/reference/add_candidates.html +++ b/docs/reference/add_candidates.html @@ -108,7 +108,7 @@ stacks
    @@ -116,7 +116,7 @@ @@ -156,7 +161,7 @@
    @@ -177,12 +182,12 @@

    Add model definitions to a data stack

    evaluated using the blend_predictions() function.

    -
    add_candidates(
    -  data_stack,
    -  candidates,
    -  name = deparse(substitute(candidates)),
    -  ...
    -)
    +
    add_candidates(
    +  data_stack,
    +  candidates,
    +  name = deparse(substitute(candidates)),
    +  ...
    +)

    Arguments

    @@ -269,89 +274,65 @@

    Examp # put together a data stack using # tuning results for regression models -reg_st <- - stacks() %>% - add_candidates(reg_res_lr) %>% - add_candidates(reg_res_svm) %>% - add_candidates(reg_res_sp) +reg_st <- + stacks() %>% + add_candidates(reg_res_lr) %>% + add_candidates(reg_res_svm) %>% + add_candidates(reg_res_sp) -reg_st +reg_st
    #> # A data stack with 3 model definitions and 15 candidate members: #> # reg_res_lr: 1 model configuration #> # reg_res_svm: 5 model configurations #> # reg_res_sp: 9 model configurations #> # Outcome: latency (numeric)
    # do the same with multinomial classification models -class_st <- - stacks() %>% - add_candidates(class_res_nn) %>% - add_candidates(class_res_rf) +class_st <- + stacks() %>% + add_candidates(class_res_nn) %>% + add_candidates(class_res_rf) -class_st +class_st
    #> # A data stack with 2 model definitions and 11 candidate members: #> # class_res_nn: 1 model configuration #> # class_res_rf: 10 model configurations #> # Outcome: reflex (factor)
    # ...or binomial classification models -log_st <- - stacks() %>% - add_candidates(log_res_nn) %>% - add_candidates(log_res_rf) +log_st <- + stacks() %>% + add_candidates(log_res_nn) %>% + add_candidates(log_res_rf) -log_st +log_st
    #> # A data stack with 2 model definitions and 11 candidate members: #> # log_res_nn: 1 model configuration #> # log_res_rf: 10 model configurations #> # Outcome: hatched (factor)
    # use custom names for each model: -log_st2 <- - stacks() %>% - add_candidates(log_res_nn, name = "neural_network") %>% - add_candidates(log_res_rf, name = "random_forest") +log_st2 <- + stacks() %>% + add_candidates(log_res_nn, name = "neural_network") %>% + add_candidates(log_res_rf, name = "random_forest") -log_st2 +log_st2
    #> # A data stack with 2 model definitions and 11 candidate members: #> # neural_network: 1 model configuration #> # random_forest: 10 model configurations #> # Outcome: hatched (factor)
    # these objects would likely then be # passed to blend_predictions(): -log_st2 %>% blend_predictions() -
    #> Loading required package: dplyr
    #> -#> Attaching package: ‘dplyr’
    #> The following objects are masked from ‘package:stats’: -#> -#> filter, lag
    #> The following objects are masked from ‘package:base’: -#> -#> intersect, setdiff, setequal, union
    #> -#> Attaching package: ‘recipes’
    #> The following object is masked from ‘package:stats’: -#> -#> step
    #> For binary classification, the first factor level is assumed to be the event. -#> Use the argument `event_level = "second"` to alter this as needed.
    #> Loading required package: scales
    #> -#> Attaching package: ‘scales’
    #> The following object is masked from ‘package:purrr’: -#> -#> discard
    #> -#> Attaching package: ‘rlang’
    #> The following objects are masked from ‘package:purrr’: -#> -#> %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int, -#> flatten_lgl, flatten_raw, invoke, list_along, modify, prepend, -#> splice
    #> -#> Attaching package: ‘vctrs’
    #> The following object is masked from ‘package:tibble’: -#> -#> data_frame
    #> The following object is masked from ‘package:dplyr’: -#> -#> data_frame
    #> Loading required package: Matrix
    #> -#> Attaching package: ‘Matrix’
    #> The following objects are masked from ‘package:tidyr’: -#> -#> expand, pack, unpack
    #> Loaded glmnet 4.0-2
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> +log_st2 %>% blend_predictions() +
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> #> Out of 11 possible candidate members, the ensemble retained 4. -#> Lasso penalty: 1e-04.
    #> +#> Penalty: 0.001. +#> Mixture: 1.
    #> #> The 4 highest weighted member classes are:
    #> # A tibble: 4 x 3 #> member type weight #> <chr> <chr> <dbl> -#> 1 .pred_yes_neural_network_1_1 mlp 7.58 -#> 2 .pred_yes_random_forest_1_09 rand_forest 1.49 -#> 3 .pred_yes_random_forest_1_03 rand_forest 1.13 -#> 4 .pred_yes_random_forest_1_01 rand_forest 1.03
    #> +#> 1 .pred_yes_neural_network_1_1 mlp 6.09 +#> 2 .pred_yes_random_forest_1_09 rand_forest 1.84 +#> 3 .pred_yes_random_forest_1_05 rand_forest 1.45 +#> 4 .pred_yes_random_forest_1_06 rand_forest 0.792
    #> #> Members have not yet been fitted with `fit_members()`.
    # }
    diff --git a/docs/reference/autoplot.linear_stack.html b/docs/reference/autoplot.linear_stack.html index 34fc30c9..300fe68d 100644 --- a/docs/reference/autoplot.linear_stack.html +++ b/docs/reference/autoplot.linear_stack.html @@ -95,7 +95,7 @@ stacks @@ -103,7 +103,7 @@ @@ -143,7 +148,7 @@
    @@ -152,7 +157,7 @@

    Plot results of a stacked ensemble model.

    # S3 method for linear_stack
    -autoplot(object, type = "performance", n = Inf, ...)
    +autoplot(object, type = "performance", n = Inf, ...)

    Arguments

    diff --git a/docs/reference/axe_model_stack.html b/docs/reference/axe_model_stack.html index 71599fc9..1053bb3a 100644 --- a/docs/reference/axe_model_stack.html +++ b/docs/reference/axe_model_stack.html @@ -100,7 +100,7 @@ stacks @@ -108,7 +108,7 @@ @@ -148,7 +153,7 @@
    @@ -162,19 +167,19 @@

    Axing a model_stack.

    # S3 method for model_stack
    -axe_call(x, verbose = FALSE, ...)
    +axe_call(x, verbose = FALSE, ...)
     
     # S3 method for model_stack
    -axe_ctrl(x, verbose = FALSE, ...)
    +axe_ctrl(x, verbose = FALSE, ...)
     
     # S3 method for model_stack
    -axe_data(x, verbose = FALSE, ...)
    +axe_data(x, verbose = FALSE, ...)
     
     # S3 method for model_stack
    -axe_env(x, verbose = FALSE, ...)
    +axe_env(x, verbose = FALSE, ...)
     
     # S3 method for model_stack
    -axe_fitted(x, verbose = FALSE, ...)
    +axe_fitted(x, verbose =FALSE, ...)

    Arguments

    @@ -202,32 +207,32 @@

    Value

    Examples

    # \donttest{ # build a regression model stack -st <- - stacks() %>% - add_candidates(reg_res_lr) %>% - add_candidates(reg_res_sp) %>% - blend_predictions() %>% - fit_members() +st <- + stacks() %>% + add_candidates(reg_res_lr) %>% + add_candidates(reg_res_sp) %>% + blend_predictions() %>% + fit_members() # remove any of the "butcherable" # elements individually -axe_call(st) +axe_call(st)
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Print methods for butchered model stacks are disabled.
    axe_ctrl(st) +#> Print methods for butchered model stacks are disabled.
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Print methods for butchered model stacks are disabled.
    axe_data(st) +#> Print methods for butchered model stacks are disabled.
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Print methods for butchered model stacks are disabled.
    axe_fitted(st) +#> Print methods for butchered model stacks are disabled.
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Print methods for butchered model stacks are disabled.
    axe_env(st) +#> Print methods for butchered model stacks are disabled.
    axe_env(st)
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> #> Print methods for butchered model stacks are disabled.
    # or do it all at once! -butchered_st <- butcher(st, verbose = TRUE) -
    #> Memory released: '2,436,296 B'
    #> x Disabled: `print()`, `summary()`
    #> [1] "9117392 bytes"
    format(object.size(butchered_st)) -
    #> [1] "5285368 bytes"
    # } +butchered_st <- butcher(st, verbose = TRUE) +
    #> Memory released: '254,144 B'
    #> x Disabled: `print()`, `summary()`
    #> [1] "12970664 bytes"
    format(object.size(butchered_st)) +
    #> [1] "7193880 bytes"
    # }
    @@ -115,7 +115,7 @@ @@ -155,7 +160,7 @@
    @@ -175,14 +180,15 @@

    Determine stacking coefficients from a data stack

    is typically used after a number of calls to add_candidates().

    -
    blend_predictions(
    -  data_stack,
    -  penalty = 10^(-6:-1),
    -  non_negative = TRUE,
    -  metric = NULL,
    -  control = tune::control_grid(),
    -  ...
    -)
    +
    blend_predictions(
    +  data_stack,
    +  penalty = 10^(-6:-1),
    +  mixture = 1,
    +  non_negative = TRUE,
    +  metric = NULL,
    +  control = tune::control_grid(),
    +  ...
    +)

    Arguments

    @@ -193,21 +199,31 @@

    Arg

    - + + + + + - @@ -229,6 +245,12 @@

    Value

    A model_stack object—while model_stacks largely contain the same elements as data_stacks, the primary data objects shift from the assessment set predictions to the member models.

    +

    Details

    + +

    Note that a regularized linear model is one of many possible +learning algorithms that could be used to fit a stacked ensemble +model. For implementations of additional ensemble learning algorithms, see +h2o::h2o.stackedEnsemble() and SuperLearner::SuperLearner().

    Example Data

    @@ -282,135 +304,153 @@

    Examp # clarification on the objects used in these examples! # put together a data stack -reg_st <- - stacks() %>% - add_candidates(reg_res_lr) %>% - add_candidates(reg_res_svm) %>% - add_candidates(reg_res_sp) +reg_st <- + stacks() %>% + add_candidates(reg_res_lr) %>% + add_candidates(reg_res_svm) %>% + add_candidates(reg_res_sp) -reg_st +reg_st
    #> # A data stack with 3 model definitions and 15 candidate members: #> # reg_res_lr: 1 model configuration #> # reg_res_svm: 5 model configurations #> # reg_res_sp: 9 model configurations #> # Outcome: latency (numeric)
    # evaluate the data stack -reg_st %>% - blend_predictions() +reg_st %>% + blend_predictions()
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> #> Out of 15 possible candidate members, the ensemble retained 5. -#> Lasso penalty: 0.1.
    #> +#> Penalty: 1e-06. +#> Mixture: 1.
    #> #> The 5 highest weighted members are:
    #> # A tibble: 5 x 3 #> member type weight #> <chr> <chr> <dbl> -#> 1 reg_res_svm_1_3 svm_rbf 0.987 -#> 2 reg_res_svm_1_4 svm_rbf 0.640 -#> 3 reg_res_svm_1_1 svm_rbf 0.405 -#> 4 reg_res_sp_9_1 linear_reg 0.294 -#> 5 reg_res_svm_1_5 svm_rbf 0.293
    #> +#> 1 reg_res_svm_1_1 svm_rbf 0.443 +#> 2 reg_res_sp_4_1 linear_reg 0.275 +#> 3 reg_res_svm_1_3 svm_rbf 0.270 +#> 4 reg_res_sp_9_1 linear_reg 0.0779 +#> 5 reg_res_sp_2_1 linear_reg 0.0410
    #> #> Members have not yet been fitted with `fit_members()`.
    # include fewer models by proposing higher penalties -reg_st %>% - blend_predictions(penalty = c(.5, 1)) +reg_st %>% + blend_predictions(penalty = c(.5, 1))
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Out of 15 possible candidate members, the ensemble retained 3. -#> Lasso penalty: 1.
    #> -#> The 3 highest weighted members are:
    #> # A tibble: 3 x 3 +#> Out of 15 possible candidate members, the ensemble retained 5. +#> Penalty: 1. +#> Mixture: 1.
    #> +#> The 5 highest weighted members are:
    #> # A tibble: 5 x 3 #> member type weight #> <chr> <chr> <dbl> -#> 1 reg_res_svm_1_3 svm_rbf 0.972 -#> 2 reg_res_svm_1_5 svm_rbf 0.337 -#> 3 reg_res_sp_9_1 linear_reg 0.284
    #> +#> 1 reg_res_svm_1_1 svm_rbf 0.435 +#> 2 reg_res_svm_1_3 svm_rbf 0.251 +#> 3 reg_res_sp_4_1 linear_reg 0.243 +#> 4 reg_res_sp_9_1 linear_reg 0.0901 +#> 5 reg_res_sp_2_1 linear_reg 0.0575
    #> #> Members have not yet been fitted with `fit_members()`.
    # allow for negative stacking coefficients # with the non_negative argument -reg_st %>% - blend_predictions(non_negative = FALSE) +reg_st %>% + blend_predictions(non_negative = FALSE)
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Out of 15 possible candidate members, the ensemble retained 9. -#> Lasso penalty: 0.1.
    #> -#> The 9 highest weighted members are:
    #> # A tibble: 9 x 3 -#> member type weight -#> <chr> <chr> <dbl> -#> 1 reg_res_svm_1_5 svm_rbf 1.45 -#> 2 reg_res_svm_1_3 svm_rbf 1.02 -#> 3 reg_res_sp_9_1 linear_reg 1.02 -#> 4 reg_res_sp_5_1 linear_reg 0.188 -#> 5 reg_res_sp_6_1 linear_reg -0.0000246 -#> 6 reg_res_lr_1_1 linear_reg -0.0318 -#> 7 reg_res_sp_8_1 linear_reg -0.0557 -#> 8 reg_res_svm_1_2 svm_rbf -0.232 -#> 9 reg_res_sp_1_1 linear_reg -0.830
    #> +#> Out of 15 possible candidate members, the ensemble retained 10. +#> Penalty: 0.1. +#> Mixture: 1.
    #> +#> The 10 highest weighted members are:
    #> # A tibble: 10 x 3 +#> member type weight +#> <chr> <chr> <dbl> +#> 1 reg_res_sp_8_1 linear_reg 2.06 +#> 2 reg_res_sp_4_1 linear_reg 0.473 +#> 3 reg_res_svm_1_1 svm_rbf 0.468 +#> 4 reg_res_sp_9_1 linear_reg 0.266 +#> 5 reg_res_svm_1_3 svm_rbf 0.152 +#> 6 reg_res_sp_7_1 linear_reg 0.108 +#> 7 reg_res_svm_1_4 svm_rbf -0.0165 +#> 8 reg_res_sp_6_1 linear_reg -0.550 +#> 9 reg_res_sp_3_1 linear_reg -1.91 +#> 10 reg_res_svm_1_2 svm_rbf -7.37
    #> #> Members have not yet been fitted with `fit_members()`.
    # use a custom metric in tuning the lasso penalty -library(yardstick) -reg_st %>% - blend_predictions(metric = metric_set(rmse)) +library(yardstick) +reg_st %>% + blend_predictions(metric = metric_set(rmse))
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> #> Out of 15 possible candidate members, the ensemble retained 5. -#> Lasso penalty: 0.1.
    #> +#> Penalty: 0.1. +#> Mixture: 1.
    #> #> The 5 highest weighted members are:
    #> # A tibble: 5 x 3 #> member type weight #> <chr> <chr> <dbl> -#> 1 reg_res_svm_1_3 svm_rbf 0.987 -#> 2 reg_res_svm_1_4 svm_rbf 0.640 -#> 3 reg_res_svm_1_1 svm_rbf 0.405 -#> 4 reg_res_sp_9_1 linear_reg 0.294 -#> 5 reg_res_svm_1_5 svm_rbf 0.293
    #> +#> 1 reg_res_svm_1_1 svm_rbf 0.442 +#> 2 reg_res_svm_1_3 svm_rbf 0.265 +#> 3 reg_res_sp_4_1 linear_reg 0.261 +#> 4 reg_res_sp_9_1 linear_reg 0.0860 +#> 5 reg_res_sp_2_1 linear_reg 0.0480
    #> #> Members have not yet been fitted with `fit_members()`.
    # pass control options for stack blending -reg_st %>% - blend_predictions( - control = tune::control_grid(allow_par = TRUE) - ) +reg_st %>% + blend_predictions( + control = tune::control_grid(allow_par = TRUE) + )
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> #> Out of 15 possible candidate members, the ensemble retained 5. -#> Lasso penalty: 0.1.
    #> +#> Penalty: 1e-06. +#> Mixture: 1.
    #> #> The 5 highest weighted members are:
    #> # A tibble: 5 x 3 #> member type weight #> <chr> <chr> <dbl> -#> 1 reg_res_svm_1_3 svm_rbf 0.987 -#> 2 reg_res_svm_1_4 svm_rbf 0.640 -#> 3 reg_res_svm_1_1 svm_rbf 0.405 -#> 4 reg_res_sp_9_1 linear_reg 0.294 -#> 5 reg_res_svm_1_5 svm_rbf 0.293
    #> +#> 1 reg_res_svm_1_1 svm_rbf 0.443 +#> 2 reg_res_sp_4_1 linear_reg 0.275 +#> 3 reg_res_svm_1_3 svm_rbf 0.270 +#> 4 reg_res_sp_9_1 linear_reg 0.0779 +#> 5 reg_res_sp_2_1 linear_reg 0.0410
    #> #> Members have not yet been fitted with `fit_members()`.
    # the process looks the same with # multinomial classification models -class_st <- - stacks() %>% - add_candidates(class_res_nn) %>% - add_candidates(class_res_rf) %>% - blend_predictions() -
    #> ! Bootstrap06: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -99); ...
    #> ! Bootstrap19: internal: No observations were detected in `truth` for level(s): 'low', ...
    -class_st +class_st <- + stacks() %>% + add_candidates(class_res_nn) %>% + add_candidates(class_res_rf) %>% + blend_predictions() +
    #> ! Bootstrap03: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -77); ...
    #> ! Bootstrap06: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -70); ...
    #> ! Bootstrap07: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -72); ...
    #> ! Bootstrap16: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -68); ...
    #> ! Bootstrap18: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -89); ...
    #> ! Bootstrap21: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -56); ...
    #> ! Bootstrap22: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -66); ...
    #> ! Bootstrap24: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -61); ...
    +class_st
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Out of 22 possible candidate members, the ensemble retained 3. -#> Lasso penalty: 0.1.
    #> Across the 3 classes, there are an average of 1.5 coefficients per class.
    #> -#> The 3 highest weighted member classes are:
    #> # A tibble: 3 x 4 -#> member type weight class -#> <chr> <chr> <dbl> <chr> -#> 1 .pred_full_class_res_nn_1_1 mlp 10.2 full -#> 2 .pred_mid_class_res_rf_1_02 rand_forest 0.888 mid -#> 3 .pred_full_class_res_rf_1_01 rand_forest 0.690 full
    #> +#> Out of 22 possible candidate members, the ensemble retained 10. +#> Penalty: 0.001. +#> Mixture: 1.
    #> Across the 3 classes, there are an average of 3.33 coefficients per class.
    #> +#> The 10 highest weighted member classes are:
    #> # A tibble: 10 x 4 +#> member type weight class +#> <chr> <chr> <dbl> <chr> +#> 1 .pred_full_class_res_nn_1_1 mlp 28.8 full +#> 2 .pred_mid_class_res_rf_1_01 rand_forest 10.9 mid +#> 3 .pred_mid_class_res_nn_1_1 mlp 7.82 mid +#> 4 .pred_mid_class_res_rf_1_04 rand_forest 5.76 low +#> 5 .pred_mid_class_res_rf_1_08 rand_forest 5.53 low +#> 6 .pred_mid_class_res_rf_1_07 rand_forest 4.48 low +#> 7 .pred_mid_class_res_rf_1_05 rand_forest 1.80 mid +#> 8 .pred_mid_class_res_rf_1_10 rand_forest 1.36 mid +#> 9 .pred_mid_class_res_rf_1_02 rand_forest 0.552 low +#> 10 .pred_full_class_res_rf_1_04 rand_forest 0.284 mid
    #> #> Members have not yet been fitted with `fit_members()`.
    # ...or binomial classification models -log_st <- - stacks() %>% - add_candidates(log_res_nn) %>% - add_candidates(log_res_rf) %>% - blend_predictions() +log_st <- + stacks() %>% + add_candidates(log_res_nn) %>% + add_candidates(log_res_rf) %>% + blend_predictions() -log_st +log_st
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Out of 11 possible candidate members, the ensemble retained 3. -#> Lasso penalty: 0.1.
    #> -#> The 3 highest weighted member classes are:
    #> # A tibble: 3 x 3 +#> Out of 11 possible candidate members, the ensemble retained 4. +#> Penalty: 1e-04. +#> Mixture: 1.
    #> +#> The 4 highest weighted member classes are:
    #> # A tibble: 4 x 3 #> member type weight #> <chr> <chr> <dbl> -#> 1 .pred_yes_log_res_nn_1_1 mlp 5.08 -#> 2 .pred_yes_log_res_rf_1_03 rand_forest 1.00 -#> 3 .pred_yes_log_res_rf_1_01 rand_forest 0.287
    #> +#> 1 .pred_yes_log_res_nn_1_1 mlp 6.11 +#> 2 .pred_yes_log_res_rf_1_09 rand_forest 1.85 +#> 3 .pred_yes_log_res_rf_1_05 rand_forest 1.45 +#> 4 .pred_yes_log_res_rf_1_06 rand_forest 0.836
    #> #> Members have not yet been fitted with `fit_members()`.
    # }
    diff --git a/docs/reference/build_linear_predictor.html b/docs/reference/build_linear_predictor.html index 90ecc6ba..88c59856 100644 --- a/docs/reference/build_linear_predictor.html +++ b/docs/reference/build_linear_predictor.html @@ -98,7 +98,7 @@ stacks @@ -106,7 +106,7 @@ @@ -147,7 +152,7 @@ @@ -156,23 +161,23 @@

    Creates an R expression for a linear predictor from a data frame of terms an coefficients

    -
    build_linear_predictor(x, ...)
    +    
    build_linear_predictor(x, ...)
     
     # S3 method for `_elnet`
    -build_linear_predictor(x, ...)
    +build_linear_predictor(x, ...)
     
     # S3 method for `_lognet`
    -build_linear_predictor(x, ...)
    +build_linear_predictor(x, ...)
     
     # S3 method for `_multnet`
    -build_linear_predictor(x, ...)
    +build_linear_predictor(x, ...)

    Arguments

    penalty

    A numeric vector of proposed penalty values used in member -weighting. Higher penalties will generally result in fewer members -being included in the resulting model stack, and vice versa. This argument -will be tuned on unless a single penalty value is given.

    A numeric vector of proposed values for total amount of +regularization used in member weighting. Higher penalties will generally +result in fewer members being included in the resulting model stack, and +vice versa. The package will tune over a grid formed from the cross +product of the penalty and mixture arguments.

    mixture

    A number between zero and one (inclusive) giving the +proportion of L1 regularization (i.e. lasso) in the model. mixture = 1 +indicates a pure lasso model, mixture = 0 indicates ridge regression, and +values in (0, 1) indicate an elastic net. The package will tune over +a grid formed from the cross product of the penalty and mixture +arguments.

    non_negative

    A logical giving whether to restrict stacking coefficients to non-negative values. If TRUE (default), 0 is passed as -the lower.limits argument to glmnet::glmnet() in fitting the +the lower.limits argument to glmnet::glmnet() in fitting the model on the data stack. Otherwise, -Inf.

    metric

    A call to yardstick::metric_set(). The metric(s) to use in +

    A call to yardstick::metric_set(). The metric(s) to use in tuning the lasso penalty on the stacking coefficients. Default values are determined by tune::tune_grid() from the outcome class.

    - + diff --git a/docs/reference/collect_parameters.html b/docs/reference/collect_parameters.html index a2450bfe..4ede9292 100644 --- a/docs/reference/collect_parameters.html +++ b/docs/reference/collect_parameters.html @@ -98,7 +98,7 @@ stacks @@ -106,7 +106,7 @@ @@ -146,7 +151,7 @@
    @@ -157,16 +162,16 @@

    Collect candidate parameters and stacking coefficients

    to their stacking coefficients as well).

    -
    collect_parameters(stack, candidates, ...)
    +    
    collect_parameters(stack, candidates, ...)
     
     # S3 method for default
    -collect_parameters(stack, candidates, ...)
    +collect_parameters(stack, candidates, ...)
     
     # S3 method for data_stack
    -collect_parameters(stack, candidates, ...)
    +collect_parameters(stack, candidates, ...)
     
     # S3 method for model_stack
    -collect_parameters(stack, candidates, ...)
    +collect_parameters(stack, candidates, ...)

    Arguments

    x

    An object that uses a glmnet::glmnet() model and all numeric predictors.

    An object that uses a glmnet::glmnet() model and all numeric predictors.

    ...
    @@ -239,59 +244,59 @@

    Examp # put together a data stack using # tuning results for regression models -reg_st <- - stacks() %>% - add_candidates(reg_res_lr) %>% - add_candidates(reg_res_svm) %>% - add_candidates(reg_res_sp, "spline") +reg_st <- + stacks() %>% + add_candidates(reg_res_lr) %>% + add_candidates(reg_res_svm) %>% + add_candidates(reg_res_sp, "spline") -reg_st +reg_st
    #> # A data stack with 3 model definitions and 15 candidate members: #> # reg_res_lr: 1 model configuration #> # reg_res_svm: 5 model configurations #> # spline: 9 model configurations #> # Outcome: latency (numeric)
    # check out the hyperparameters for some of the candidates -collect_parameters(reg_st, "reg_res_svm") +collect_parameters(reg_st, "reg_res_svm")
    #> # A tibble: 5 x 3 #> member cost rbf_sigma #> <chr> <dbl> <dbl> -#> 1 reg_res_svm_1_1 12.0 3.62e- 8 -#> 2 reg_res_svm_1_2 0.0221 4.15e- 3 -#> 3 reg_res_svm_1_3 0.215 5.07e- 2 -#> 4 reg_res_svm_1_4 0.00116 9.29e-10 -#> 5 reg_res_svm_1_5 1.44 5.40e- 6
    -collect_parameters(reg_st, "spline") +#> 1 reg_res_svm_1_1 17.2 1.87e- 1 +#> 2 reg_res_svm_1_2 0.00129 1.28e- 7 +#> 3 reg_res_svm_1_3 3.26 2.54e- 3 +#> 4 reg_res_svm_1_4 0.111 5.19e-10 +#> 5 reg_res_svm_1_5 0.0241 4.02e- 5
    +collect_parameters(reg_st, "spline")
    #> # A tibble: 9 x 2 #> member age #> <chr> <int> -#> 1 spline_1_1 12 -#> 2 spline_2_1 7 -#> 3 spline_3_1 9 -#> 4 spline_4_1 10 -#> 5 spline_5_1 2 -#> 6 spline_6_1 13 -#> 7 spline_7_1 4 -#> 8 spline_8_1 6 -#> 9 spline_9_1 14
    +#> 1 spline_1_1 8 +#> 2 spline_2_1 14 +#> 3 spline_3_1 5 +#> 4 spline_4_1 13 +#> 5 spline_5_1 3 +#> 6 spline_6_1 6 +#> 7 spline_7_1 10 +#> 8 spline_8_1 2 +#> 9 spline_9_1 12
    # blend the data stack to view the hyperparameters # along with the stacking coefficients! -collect_parameters( - reg_st %>% blend_predictions(), +collect_parameters( + reg_st %>% blend_predictions(), "spline" -) +)
    #> # A tibble: 9 x 3 -#> member age coef -#> <chr> <int> <dbl> -#> 1 spline_1_1 12 0 -#> 2 spline_2_1 7 0 -#> 3 spline_3_1 9 0 -#> 4 spline_4_1 10 0 -#> 5 spline_5_1 2 0 -#> 6 spline_6_1 13 0 -#> 7 spline_7_1 4 0 -#> 8 spline_8_1 6 0 -#> 9 spline_9_1 14 0.294
    # } +#> member age coef +#> <chr> <int> <dbl> +#> 1 spline_1_1 8 0 +#> 2 spline_2_1 14 0.0480 +#> 3 spline_3_1 5 0 +#> 4 spline_4_1 13 0.261 +#> 5 spline_5_1 3 0 +#> 6 spline_6_1 6 0 +#> 7 spline_7_1 10 0 +#> 8 spline_8_1 2 0 +#> 9 spline_9_1 12 0.0860
    # }
    @@ -111,7 +111,7 @@ @@ -151,7 +156,7 @@
    @@ -167,11 +172,11 @@

    Control wrappers

    with the arguments save_pred = TRUE, save_workflow = TRUE.

    -
    control_stack_grid()
    +    
    control_stack_grid()
     
    -control_stack_resamples()
    +control_stack_resamples()
     
    -control_stack_bayes()
    +control_stack_bayes()

    Value

    diff --git a/docs/reference/example_data.html b/docs/reference/example_data.html index 7d00c9c2..fde1b290 100644 --- a/docs/reference/example_data.html +++ b/docs/reference/example_data.html @@ -96,7 +96,7 @@ stacks @@ -104,7 +104,7 @@ @@ -144,7 +149,7 @@
    @@ -153,23 +158,23 @@

    Example Objects

    and vignettes derived from a study on 1212 red-eyed tree frog embryos!

    -
    reg_res_svm
    +    
    reg_res_svm
     
    -reg_res_sp
    +reg_res_sp
     
    -reg_res_lr
    +reg_res_lr
     
    -reg_folds
    +reg_folds
     
    -class_res_nn
    +class_res_nn
     
    -class_res_rf
    +class_res_rf
     
    -class_folds
    +class_folds
     
    -log_res_nn
    +log_res_nn
     
    -log_res_rf
    +log_res_rf

    Format

    @@ -185,9 +190,9 @@

    FormatAn object of class tune_results (inherits from tbl_df, tbl, data.frame) with 5 rows and 5 columns.

    Source

    -

    Julie Jung et al. (Forthcoming) Multimodal mechanosensing enables treefrog +

    Julie Jung et al. (2020) Multimodal mechanosensing enables treefrog embryos to escape egg-predators. -https://doi.org/10.1101/2020.09.18.304295

    +https://doi.org/10.1242/jeb.236141

    Details

    Red-eyed tree frog (RETF) embryos can hatch earlier than their normal @@ -223,228 +228,229 @@

    Details to the stimulus) using most all of the other variables as predictors.

    The source code for generating these objects is given below.

    # setup: packages, data, resample, basic recipe ------------------------
    -library(stacks)
    -library(tune)
    -library(rsample)
    -library(parsnip)
    -library(workflows)
    -library(recipes)
    -library(yardstick)
    -
    -set.seed(1)
    -
    -ctrl_grid <- 
    -  tune::control_grid(
    -    save_pred = TRUE,
    -    save_workflow = TRUE
    -  )
    -
    -ctrl_res <- 
    -  tune::control_resamples(
    -    save_pred = TRUE,
    -    save_workflow = TRUE
    -  )
    +library(stacks)
    +library(tune)
    +library(rsample)
    +library(parsnip)
    +library(workflows)
    +library(recipes)
    +library(yardstick)
    +library(workflowsets)
    +
    +set.seed(1)
    +
    +ctrl_grid <- 
    +  tune::control_grid(
    +    save_pred = TRUE,
    +    save_workflow = TRUE
    +  )
    +
    +ctrl_res <- 
    +  tune::control_resamples(
    +    save_pred = TRUE,
    +    save_workflow = TRUE
    +  )
     
     # for regression, predict latency to hatch (excluding NAs)
    -tree_frogs_reg <- 
    -  tree_frogs %>% 
    -  filter(!is.na(latency)) %>%
    -  select(-clutch, -hatched)
    +tree_frogs_reg <- 
    +  tree_frogs %>% 
    +  filter(!is.na(latency)) %>%
    +  select(-clutch, -hatched)
     
    -set.seed(1)
    -tree_frogs_reg_split <- rsample::initial_split(tree_frogs_reg)
    +set.seed(1)
    +tree_frogs_reg_split <- rsample::initial_split(tree_frogs_reg)
     
    -set.seed(1)
    -tree_frogs_reg_train <- rsample::training(tree_frogs_reg_split)
    +set.seed(1)
    +tree_frogs_reg_train <- rsample::training(tree_frogs_reg_split)
     
    -set.seed(1)
    -tree_frogs_reg_test  <- rsample::testing(tree_frogs_reg_split)
    +set.seed(1)
    +tree_frogs_reg_test  <- rsample::testing(tree_frogs_reg_split)
     
    -set.seed(1)
    -reg_folds <- rsample::vfold_cv(tree_frogs_reg_train, v = 5)
    +set.seed(1)
    +reg_folds <- rsample::vfold_cv(tree_frogs_reg_train, v = 5)
     
    -tree_frogs_reg_rec <- 
    -  recipes::recipe(latency ~ ., data = tree_frogs_reg_train) %>%
    -  recipes::step_dummy(recipes::all_nominal()) %>%
    -  recipes::step_zv(recipes::all_predictors())
    +tree_frogs_reg_rec <- 
    +  recipes::recipe(latency ~ ., data = tree_frogs_reg_train) %>%
    +  recipes::step_dummy(recipes::all_nominal()) %>%
    +  recipes::step_zv(recipes::all_predictors())
     
    -metric <- yardstick::metric_set(yardstick::rmse)
    +metric <- yardstick::metric_set(yardstick::rmse)
     
     # linear regression ---------------------------------------
    -lin_reg_spec <-
    -  parsnip::linear_reg() %>%
    -  parsnip::set_engine("lm")
    -
    -reg_wf_lr <- 
    -  workflows::workflow() %>%
    -  workflows::add_model(lin_reg_spec) %>%
    -  workflows::add_recipe(tree_frogs_reg_rec)
    -
    -set.seed(1)
    -reg_res_lr <- 
    -  tune::fit_resamples(
    -    object = reg_wf_lr,
    -    resamples = reg_folds,
    -    metrics = metric,
    -    control = ctrl_res
    -  )
    +lin_reg_spec <-
    +  parsnip::linear_reg() %>%
    +  parsnip::set_engine("lm")
    +
    +reg_wf_lr <- 
    +  workflows::workflow() %>%
    +  workflows::add_model(lin_reg_spec) %>%
    +  workflows::add_recipe(tree_frogs_reg_rec)
    +
    +set.seed(1)
    +reg_res_lr <- 
    +  tune::fit_resamples(
    +    object = reg_wf_lr,
    +    resamples = reg_folds,
    +    metrics = metric,
    +    control = ctrl_res
    +  )
     
     # SVM regression ----------------------------------
    -svm_spec <- 
    -  parsnip::svm_rbf(
    -    cost = tune::tune(), 
    -    rbf_sigma = tune::tune()
    -  ) %>%
    -  parsnip::set_engine("kernlab") %>%
    -  parsnip::set_mode("regression")
    -
    -reg_wf_svm <- 
    -  workflows::workflow() %>%
    -  workflows::add_model(svm_spec) %>%
    -  workflows::add_recipe(tree_frogs_reg_rec)
    -
    -set.seed(1)
    -reg_res_svm <- 
    -  tune::tune_grid(
    -    object = reg_wf_svm,
    -    resamples = reg_folds, 
    -    grid = 5,
    -    control = ctrl_grid
    -  )
    +svm_spec <- 
    +  parsnip::svm_rbf(
    +    cost = tune::tune(), 
    +    rbf_sigma = tune::tune()
    +  ) %>%
    +  parsnip::set_engine("kernlab") %>%
    +  parsnip::set_mode("regression")
    +
    +reg_wf_svm <- 
    +  workflows::workflow() %>%
    +  workflows::add_model(svm_spec) %>%
    +  workflows::add_recipe(tree_frogs_reg_rec)
    +
    +set.seed(1)
    +reg_res_svm <- 
    +  tune::tune_grid(
    +    object = reg_wf_svm,
    +    resamples = reg_folds, 
    +    grid = 5,
    +    control = ctrl_grid
    +  )
     
     # spline regression ---------------------------------------
    -spline_rec <- 
    -  tree_frogs_reg_rec %>%
    -  recipes::step_ns(age, deg_free = tune::tune("age"))
    -
    -reg_wf_sp <- 
    -  workflows::workflow() %>%
    -  workflows::add_model(lin_reg_spec) %>%
    -  workflows::add_recipe(spline_rec)
    -
    -set.seed(1)
    -reg_res_sp <- 
    -  tune::tune_grid(
    -    object = reg_wf_sp,
    -    resamples = reg_folds,
    -    metrics = metric,
    -    control = ctrl_grid
    -  )
    +spline_rec <- 
    +  tree_frogs_reg_rec %>%
    +  recipes::step_ns(age, deg_free = tune::tune("age"))
    +
    +reg_wf_sp <- 
    +  workflows::workflow() %>%
    +  workflows::add_model(lin_reg_spec) %>%
    +  workflows::add_recipe(spline_rec)
    +
    +set.seed(1)
    +reg_res_sp <- 
    +  tune::tune_grid(
    +    object = reg_wf_sp,
    +    resamples = reg_folds,
    +    metrics = metric,
    +    control = ctrl_grid
    +  )
     
     # classification - preliminaries -----------------------------------
    -tree_frogs_class <- 
    -  tree_frogs %>%
    -  dplyr::select(-c(clutch, latency))
    +tree_frogs_class <- 
    +  tree_frogs %>%
    +  dplyr::select(-c(clutch, latency))
     
    -set.seed(1)
    -tree_frogs_class_split <- rsample::initial_split(tree_frogs_class)
    +set.seed(1)
    +tree_frogs_class_split <- rsample::initial_split(tree_frogs_class)
     
    -set.seed(1)
    -tree_frogs_class_train <- rsample::training(tree_frogs_class_split)
    +set.seed(1)
    +tree_frogs_class_train <- rsample::training(tree_frogs_class_split)
     
    -set.seed(1)
    -tree_frogs_class_test  <- rsample::testing(tree_frogs_class_split)
    +set.seed(1)
    +tree_frogs_class_test  <- rsample::testing(tree_frogs_class_split)
     
    -set.seed(1)
    -class_folds <- rsample::vfold_cv(tree_frogs_class_train, v = 5)
    +set.seed(1)
    +class_folds <- rsample::vfold_cv(tree_frogs_class_train, v = 5)
     
    -tree_frogs_class_rec <- 
    -  recipes::recipe(reflex ~ ., data = tree_frogs_class_train) %>%
    -  recipes::step_dummy(recipes::all_nominal(), -reflex) %>%
    -  recipes::step_zv(recipes::all_predictors()) %>%
    -  recipes::step_normalize(recipes::all_numeric())
    +tree_frogs_class_rec <- 
    +  recipes::recipe(reflex ~ ., data = tree_frogs_class_train) %>%
    +  recipes::step_dummy(recipes::all_nominal(), -reflex) %>%
    +  recipes::step_zv(recipes::all_predictors()) %>%
    +  recipes::step_normalize(recipes::all_numeric())
     
     # random forest classification --------------------------------------
    -rand_forest_spec <- 
    -  parsnip::rand_forest(
    -    mtry = tune::tune(),
    -    trees = 500,
    -    min_n = tune::tune()
    -  ) %>%
    -  parsnip::set_mode("classification") %>%
    -  parsnip::set_engine("ranger")
    -
    -class_wf_rf <-
    -  workflows::workflow() %>%
    -  workflows::add_recipe(tree_frogs_class_rec) %>%
    -  workflows::add_model(rand_forest_spec)
    -
    -set.seed(1)
    -class_res_rf <- 
    -  tune::tune_grid(
    -    object = class_wf_rf, 
    -    resamples = class_folds, 
    -    grid = 10,
    -    control = ctrl_grid
    -  )
    +rand_forest_spec <- 
    +  parsnip::rand_forest(
    +    mtry = tune::tune(),
    +    trees = 500,
    +    min_n = tune::tune()
    +  ) %>%
    +  parsnip::set_mode("classification") %>%
    +  parsnip::set_engine("ranger")
    +
    +class_wf_rf <-
    +  workflows::workflow() %>%
    +  workflows::add_recipe(tree_frogs_class_rec) %>%
    +  workflows::add_model(rand_forest_spec)
    +
    +set.seed(1)
    +class_res_rf <- 
    +  tune::tune_grid(
    +    object = class_wf_rf, 
    +    resamples = class_folds, 
    +    grid = 10,
    +    control = ctrl_grid
    +  )
     
     # neural network classification -------------------------------------
    -nnet_spec <-
    -  mlp(hidden_units = 5, penalty = 0.01, epochs = 100) %>%
    -  set_mode("classification") %>%
    -  set_engine("nnet")
    -
    -class_wf_nn <- 
    -  workflows::workflow() %>%
    -  workflows::add_recipe(tree_frogs_class_rec) %>%
    -  workflows::add_model(nnet_spec)
    -
    -set.seed(1)
    -class_res_nn <-
    -  tune::fit_resamples(
    -    object = class_wf_nn, 
    -    resamples = class_folds, 
    -    control = ctrl_res
    -  )
    +nnet_spec <-
    +  mlp(hidden_units = 5, penalty = 0.01, epochs = 100) %>%
    +  set_mode("classification") %>%
    +  set_engine("nnet")
    +
    +class_wf_nn <- 
    +  workflows::workflow() %>%
    +  workflows::add_recipe(tree_frogs_class_rec) %>%
    +  workflows::add_model(nnet_spec)
    +
    +set.seed(1)
    +class_res_nn <-
    +  tune::fit_resamples(
    +    object = class_wf_nn, 
    +    resamples = class_folds, 
    +    control = ctrl_res
    +  )
     
     # binary classification --------------------------------
    -tree_frogs_2_class_rec <- 
    -  recipes::recipe(hatched ~ ., data = tree_frogs_class_train) %>%
    -  recipes::step_dummy(recipes::all_nominal(), -hatched) %>%
    -  recipes::step_zv(recipes::all_predictors()) %>%
    -  recipes::step_normalize(recipes::all_numeric())
    -
    -set.seed(1)
    -rand_forest_spec_2 <- 
    -  parsnip::rand_forest(
    -    mtry = tune(),
    -    trees = 500,
    -    min_n = tune()
    -  ) %>%
    -  parsnip::set_mode("classification") %>%
    -  parsnip::set_engine("ranger")
    -
    -log_wf_rf <-
    -  workflows::workflow() %>%
    -  workflows::add_recipe(tree_frogs_2_class_rec) %>%
    -  workflows::add_model(rand_forest_spec_2)
    -
    -set.seed(1)
    -log_res_rf <- 
    -  tune::tune_grid(
    -    object = log_wf_rf, 
    -    resamples = class_folds, 
    -    grid = 10,
    -    control = ctrl_grid
    -  )
    -
    -nnet_spec_2 <-
    -  parsnip::mlp(epochs = 100, hidden_units = 5, penalty = 0.1) %>%
    -  parsnip::set_mode("classification") %>%
    -  parsnip::set_engine("nnet", verbose = 0)
    -
    -log_wf_nn <- 
    -  workflows::workflow() %>%
    -  workflows::add_recipe(tree_frogs_2_class_rec) %>%
    -  workflows::add_model(nnet_spec_2)
    -
    -set.seed(1)
    -log_res_nn <-
    -  tune::fit_resamples(
    -    object = log_wf_nn, 
    -    resamples = class_folds, 
    -    control = ctrl_res
    -  )
    +tree_frogs_2_class_rec <- 
    +  recipes::recipe(hatched ~ ., data = tree_frogs_class_train) %>%
    +  recipes::step_dummy(recipes::all_nominal(), -hatched) %>%
    +  recipes::step_zv(recipes::all_predictors()) %>%
    +  recipes::step_normalize(recipes::all_numeric())
    +
    +set.seed(1)
    +rand_forest_spec_2 <- 
    +  parsnip::rand_forest(
    +    mtry = tune(),
    +    trees = 500,
    +    min_n = tune()
    +  ) %>%
    +  parsnip::set_mode("classification") %>%
    +  parsnip::set_engine("ranger")
    +
    +log_wf_rf <-
    +  workflows::workflow() %>%
    +  workflows::add_recipe(tree_frogs_2_class_rec) %>%
    +  workflows::add_model(rand_forest_spec_2)
    +
    +set.seed(1)
    +log_res_rf <- 
    +  tune::tune_grid(
    +    object = log_wf_rf, 
    +    resamples = class_folds, 
    +    grid = 10,
    +    control = ctrl_grid
    +  )
    +
    +nnet_spec_2 <-
    +  parsnip::mlp(epochs = 100, hidden_units = 5, penalty = 0.1) %>%
    +  parsnip::set_mode("classification") %>%
    +  parsnip::set_engine("nnet", verbose = 0)
    +
    +log_wf_nn <- 
    +  workflows::workflow() %>%
    +  workflows::add_recipe(tree_frogs_2_class_rec) %>%
    +  workflows::add_model(nnet_spec_2)
    +
    +set.seed(1)
    +log_res_nn <-
    +  tune::fit_resamples(
    +    object = log_wf_nn, 
    +    resamples = class_folds, 
    +    control = ctrl_res
    +  )
     

    diff --git a/docs/reference/fit_members.html b/docs/reference/fit_members.html index 5e61ddb5..791a0036 100644 --- a/docs/reference/fit_members.html +++ b/docs/reference/fit_members.html @@ -100,7 +100,7 @@ stacks @@ -108,7 +108,7 @@ @@ -148,7 +153,7 @@
    @@ -161,7 +166,7 @@

    Fit model stack members with non-zero stacking coefficients

    training set using fit_members().

    -
    fit_members(model_stack, ...)
    +
    fit_members(model_stack, ...)

    Arguments

    @@ -237,83 +242,86 @@

    Examp # clarification on the objects used in these examples! # put together a data stack -reg_st <- - stacks() %>% - add_candidates(reg_res_lr) %>% - add_candidates(reg_res_svm) %>% - add_candidates(reg_res_sp) +reg_st <- + stacks() %>% + add_candidates(reg_res_lr) %>% + add_candidates(reg_res_svm) %>% + add_candidates(reg_res_sp) -reg_st +reg_st
    #> # A data stack with 3 model definitions and 15 candidate members: #> # reg_res_lr: 1 model configuration #> # reg_res_svm: 5 model configurations #> # reg_res_sp: 9 model configurations #> # Outcome: latency (numeric)
    # evaluate the data stack and fit the member models -reg_st %>% - blend_predictions() %>% - fit_members() +reg_st %>% + blend_predictions() %>% + fit_members()
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> #> Out of 15 possible candidate members, the ensemble retained 5. -#> Lasso penalty: 0.1.
    #> +#> Penalty: 0.1. +#> Mixture: 1.
    #> #> The 5 highest weighted members are:
    #> # A tibble: 5 x 3 #> member type weight #> <chr> <chr> <dbl> -#> 1 reg_res_svm_1_3 svm_rbf 0.987 -#> 2 reg_res_svm_1_4 svm_rbf 0.640 -#> 3 reg_res_svm_1_1 svm_rbf 0.405 -#> 4 reg_res_sp_9_1 linear_reg 0.294 -#> 5 reg_res_svm_1_5 svm_rbf 0.293
    -reg_st +#> 1 reg_res_svm_1_1 svm_rbf 0.442 +#> 2 reg_res_svm_1_3 svm_rbf 0.265 +#> 3 reg_res_sp_4_1 linear_reg 0.261 +#> 4 reg_res_sp_9_1 linear_reg 0.0860 +#> 5 reg_res_sp_2_1 linear_reg 0.0480
    +reg_st
    #> # A data stack with 3 model definitions and 15 candidate members: #> # reg_res_lr: 1 model configuration #> # reg_res_svm: 5 model configurations #> # reg_res_sp: 9 model configurations #> # Outcome: latency (numeric)
    # do the same with multinomial classification models -class_st <- - stacks() %>% - add_candidates(class_res_nn) %>% - add_candidates(class_res_rf) %>% - blend_predictions() %>% - fit_members() -
    #> ! Bootstrap01: internal: No observations were detected in `truth` for level(s): 'low', ...
    #> ! Bootstrap15: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -82); ...
    -class_st +class_st <- + stacks() %>% + add_candidates(class_res_nn) %>% + add_candidates(class_res_rf) %>% + blend_predictions() %>% + fit_members() +
    #> ! Bootstrap05: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -59); ...
    #> ! Bootstrap19: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -86); ...
    #> ! Bootstrap22: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -61); ...
    +class_st
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Out of 22 possible candidate members, the ensemble retained 17. -#> Lasso penalty: 1e-06.
    #> Across the 3 classes, there are an average of 5.67 coefficients per class.
    #> +#> Out of 22 possible candidate members, the ensemble retained 10. +#> Penalty: 0.001. +#> Mixture: 1.
    #> Across the 3 classes, there are an average of 3.33 coefficients per class.
    #> #> The 10 highest weighted member classes are:
    #> # A tibble: 10 x 4 #> member type weight class #> <chr> <chr> <dbl> <chr> -#> 1 .pred_full_class_res_nn_1_1 mlp 136. full -#> 2 .pred_full_class_res_nn_1_1 mlp 43.4 mid -#> 3 .pred_mid_class_res_rf_1_04 rand_forest 37.2 mid -#> 4 .pred_full_class_res_rf_1_04 rand_forest 31.2 mid -#> 5 .pred_mid_class_res_rf_1_05 rand_forest 31.0 low -#> 6 .pred_mid_class_res_rf_1_09 rand_forest 18.7 low -#> 7 .pred_mid_class_res_rf_1_10 rand_forest 8.41 low -#> 8 .pred_mid_class_res_rf_1_02 rand_forest 7.30 mid -#> 9 .pred_full_class_res_rf_1_05 rand_forest 6.61 full -#> 10 .pred_mid_class_res_rf_1_03 rand_forest 4.33 low
    +#> 1 .pred_full_class_res_nn_1_1 mlp 28.8 full +#> 2 .pred_mid_class_res_rf_1_01 rand_forest 10.9 mid +#> 3 .pred_mid_class_res_nn_1_1 mlp 7.82 mid +#> 4 .pred_mid_class_res_rf_1_04 rand_forest 5.76 low +#> 5 .pred_mid_class_res_rf_1_08 rand_forest 5.53 low +#> 6 .pred_mid_class_res_rf_1_07 rand_forest 4.48 low +#> 7 .pred_mid_class_res_rf_1_05 rand_forest 1.80 mid +#> 8 .pred_mid_class_res_rf_1_10 rand_forest 1.36 mid +#> 9 .pred_mid_class_res_rf_1_02 rand_forest 0.552 low +#> 10 .pred_full_class_res_rf_1_04 rand_forest 0.284 mid
    # ...or binomial classification models -log_st <- - stacks() %>% - add_candidates(log_res_nn) %>% - add_candidates(log_res_rf) %>% - blend_predictions() %>% - fit_members() +log_st <- + stacks() %>% + add_candidates(log_res_nn) %>% + add_candidates(log_res_rf) %>% + blend_predictions() %>% + fit_members() -log_st +log_st
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> #> Out of 11 possible candidate members, the ensemble retained 4. -#> Lasso penalty: 1e-04.
    #> +#> Penalty: 1e-05. +#> Mixture: 1.
    #> #> The 4 highest weighted member classes are:
    #> # A tibble: 4 x 3 #> member type weight #> <chr> <chr> <dbl> -#> 1 .pred_yes_log_res_nn_1_1 mlp 7.58 -#> 2 .pred_yes_log_res_rf_1_09 rand_forest 1.49 -#> 3 .pred_yes_log_res_rf_1_03 rand_forest 1.13 -#> 4 .pred_yes_log_res_rf_1_01 rand_forest 1.03
    # } +#> 1 .pred_yes_log_res_nn_1_1 mlp 6.09 +#> 2 .pred_yes_log_res_rf_1_09 rand_forest 1.87 +#> 3 .pred_yes_log_res_rf_1_05 rand_forest 1.45 +#> 4 .pred_yes_log_res_rf_1_06 rand_forest 0.842
    # }
    diff --git a/docs/reference/get_expressions.html b/docs/reference/get_expressions.html index 4153f7bf..7fcb56b7 100644 --- a/docs/reference/get_expressions.html +++ b/docs/reference/get_expressions.html @@ -95,7 +95,7 @@ stacks @@ -103,7 +103,7 @@ @@ -143,7 +148,7 @@
    @@ -151,16 +156,16 @@

    Obtain prediction equations for all possible values of type

    Obtain prediction equations for all possible values of type

    -
    get_expressions(x, ...)
    +    
    get_expressions(x, ...)
     
     # S3 method for `_multnet`
    -get_expressions(x, ...)
    +get_expressions(x, ...)
     
     # S3 method for `_lognet`
    -get_expressions(x, ...)
    +get_expressions(x, ...)
     
     # S3 method for `_elnet`
    -get_expressions(x, ...)
    +get_expressions(x, ...)

    Arguments

    diff --git a/docs/reference/index.html b/docs/reference/index.html index 727224aa..ae60d65c 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -94,7 +94,7 @@ stacks @@ -102,7 +102,7 @@ diff --git a/docs/reference/predict.data_stack.html b/docs/reference/predict.data_stack.html index 7b4d3a4f..d3219d26 100644 --- a/docs/reference/predict.data_stack.html +++ b/docs/reference/predict.data_stack.html @@ -96,7 +96,7 @@ stacks @@ -104,7 +104,7 @@ @@ -144,7 +149,7 @@
    @@ -154,7 +159,7 @@

    Predicting with a model stack

    # S3 method for data_stack
    -predict(object, ...)
    +predict(object, ...)

    Arguments

    diff --git a/docs/reference/predict.model_stack.html b/docs/reference/predict.model_stack.html index 38ea2ab1..bed28e35 100644 --- a/docs/reference/predict.model_stack.html +++ b/docs/reference/predict.model_stack.html @@ -95,7 +95,7 @@ stacks @@ -103,7 +103,7 @@ @@ -143,7 +148,7 @@
    @@ -152,7 +157,7 @@

    Predicting with a model stack

    # S3 method for model_stack
    -predict(object, new_data, type = NULL, members = FALSE, opts = list(), ...)
    +predict(object, new_data, type =NULL, members =FALSE, opts =list(), ...)

    Arguments

    @@ -234,133 +239,151 @@

    Examp # clarification on the data and tuning results # objects used in these examples! -data(tree_frogs_reg_test) -data(tree_frogs_class_test) +data(tree_frogs_reg_test) +data(tree_frogs_class_test) # build and fit a regression model stack -reg_st <- - stacks() %>% - add_candidates(reg_res_lr) %>% - add_candidates(reg_res_sp) %>% - blend_predictions() %>% - fit_members() - -reg_st +reg_st <- + stacks() %>% + add_candidates(reg_res_lr) %>% + add_candidates(reg_res_sp) %>% + blend_predictions() %>% + fit_members() + +reg_st
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Out of 10 possible candidate members, the ensemble retained 3. -#> Lasso penalty: 1e-06.
    #> -#> The 3 highest weighted members are:
    #> # A tibble: 3 x 3 +#> Out of 10 possible candidate members, the ensemble retained 5. +#> Penalty: 1e-06. +#> Mixture: 1.
    #> +#> The 5 highest weighted members are:
    #> # A tibble: 5 x 3 #> member type weight #> <chr> <chr> <dbl> -#> 1 reg_res_lr_1_1 linear_reg 0.355 -#> 2 reg_res_sp_9_1 linear_reg 0.304 -#> 3 reg_res_sp_5_1 linear_reg 0.260
    +#> 1 reg_res_sp_2_1 linear_reg 0.346 +#> 2 reg_res_lr_1_1 linear_reg 0.238 +#> 3 reg_res_sp_4_1 linear_reg 0.224 +#> 4 reg_res_sp_8_1 linear_reg 0.101 +#> 5 reg_res_sp_9_1 linear_reg 0.0537
    # predict on the tree frogs testing data -predict(reg_st, tree_frogs_reg_test) +predict(reg_st, tree_frogs_reg_test)
    #> # A tibble: 143 x 1 #> .pred #> <dbl> -#> 1 40.3 -#> 2 111. -#> 3 90.6 -#> 4 33.7 -#> 5 75.3 -#> 6 90.0 -#> 7 122. -#> 8 82.4 -#> 9 37.5 -#> 10 77.3 +#> 1 115. +#> 2 31.7 +#> 3 93.7 +#> 4 122. +#> 5 167. +#> 6 95.2 +#> 7 125. +#> 8 222. +#> 9 167. +#> 10 156. #> # … with 133 more rows
    # include the predictions from the members -predict(reg_st, tree_frogs_reg_test, members = TRUE) -
    #> # A tibble: 143 x 4 -#> .pred reg_res_lr_1_1 reg_res_sp_5_1 reg_res_sp_9_1 -#> <dbl> <dbl> <dbl> <dbl> -#> 1 40.3 38.0 34.3 36.9 -#> 2 111. 124. 117. 98.6 -#> 3 90.6 84.5 92.4 98.2 -#> 4 33.7 35.3 28.0 23.8 -#> 5 75.3 79.0 77.0 67.6 -#> 6 90.0 83.7 92.2 97.3 -#> 7 122. 118. 139. 123. -#> 8 82.4 80.0 79.2 88.0 -#> 9 37.5 36.5 30.7 32.7 -#> 10 77.3 79.4 78.0 72.8 -#> # … with 133 more rows
    +predict(reg_st, tree_frogs_reg_test, members = TRUE) +
    #> # A tibble: 143 x 6 +#> .pred reg_res_lr_1_1 reg_res_sp_8_1 reg_res_sp_9_1 reg_res_sp_4_1 +#> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 115. 117. 101. 116. 118. +#> 2 31.7 34.2 26.7 27.8 27.5 +#> 3 93.7 111. 107. 84.1 83.2 +#> 4 122. 106. 112. 135. 132. +#> 5 167. 147. 161. 185. 178. +#> 6 95.2 85.7 98.5 98.8 102. +#> 7 125. 102. 115. 141. 135. +#> 8 222. 224. 210. 229. 231. +#> 9 167. 147. 160. 185. 179. +#> 10 156. 153. 156. 157. 161. +#> # … with 133 more rows, and 1 more variable: reg_res_sp_2_1 <dbl>
    # build and fit a classification model stack -class_st <- - stacks() %>% - add_candidates(class_res_nn) %>% - add_candidates(class_res_rf) %>% - blend_predictions() %>% - fit_members() -
    #> ! Bootstrap17: internal: No observations were detected in `truth` for level(s): 'low', ...
    -class_st +class_st <- + stacks() %>% + add_candidates(class_res_nn) %>% + add_candidates(class_res_rf) %>% + blend_predictions() %>% + fit_members() +
    #> ! Bootstrap08: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -73); ...
    #> ! Bootstrap14: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -76); ...
    #> ! Bootstrap24: preprocessor 1/1, model 1/1: from glmnet Fortran code (error code -92); ...
    +class_st
    #> ── A stacked ensemble model ─────────────────────────────────────
    #> -#> Out of 22 possible candidate members, the ensemble retained 3. -#> Lasso penalty: 0.1.
    #> Across the 3 classes, there are an average of 1.5 coefficients per class.
    #> -#> The 3 highest weighted member classes are:
    #> # A tibble: 3 x 4 -#> member type weight class -#> <chr> <chr> <dbl> <chr> -#> 1 .pred_full_class_res_nn_1_1 mlp 10.2 full -#> 2 .pred_mid_class_res_rf_1_02 rand_forest 0.888 mid -#> 3 .pred_full_class_res_rf_1_01 rand_forest 0.690 full
    +#> Out of 22 possible candidate members, the ensemble retained 10. +#> Penalty: 0.001. +#> Mixture: 1.
    #> Across the 3 classes, there are an average of 3.33 coefficients per class.
    #> +#> The 10 highest weighted member classes are:
    #> # A tibble: 10 x 4 +#> member type weight class +#> <chr> <chr> <dbl> <chr> +#> 1 .pred_full_class_res_nn_1_1 mlp 28.8 full +#> 2 .pred_mid_class_res_rf_1_01 rand_forest 10.9 mid +#> 3 .pred_mid_class_res_nn_1_1 mlp 7.82 mid +#> 4 .pred_mid_class_res_rf_1_04 rand_forest 5.76 low +#> 5 .pred_mid_class_res_rf_1_08 rand_forest 5.53 low +#> 6 .pred_mid_class_res_rf_1_07 rand_forest 4.48 low +#> 7 .pred_mid_class_res_rf_1_05 rand_forest 1.80 mid +#> 8 .pred_mid_class_res_rf_1_10 rand_forest 1.36 mid +#> 9 .pred_mid_class_res_rf_1_02 rand_forest 0.552 low +#> 10 .pred_full_class_res_rf_1_04 rand_forest 0.284 mid
    # predict reflex, first as a class, then as # class probabilities -predict(class_st, tree_frogs_class_test) +predict(class_st, tree_frogs_class_test)
    #> # A tibble: 303 x 1 #> .pred_class #> <fct> -#> 1 low -#> 2 mid +#> 1 full +#> 2 low #> 3 low -#> 4 mid +#> 4 full #> 5 low -#> 6 low -#> 7 mid +#> 6 mid +#> 7 low #> 8 full #> 9 full #> 10 low -#> # … with 293 more rows
    predict(class_st, tree_frogs_class_test, type = "prob") +#> # … with 293 more rows
    predict(class_st, tree_frogs_class_test, type = "prob")
    #> # A tibble: 303 x 3 -#> .pred_full .pred_low .pred_mid -#> <dbl> <dbl> <dbl> -#> 1 0.909 0.0558 0.0348 -#> 2 0.123 0.486 0.390 -#> 3 0.911 0.0547 0.0340 -#> 4 0.129 0.522 0.349 -#> 5 0.911 0.0548 0.0341 -#> 6 0.911 0.0547 0.0341 -#> 7 0.124 0.485 0.390 -#> 8 0.106 0.417 0.478 -#> 9 0.0921 0.384 0.524 -#> 10 0.911 0.0547 0.0341 +#> .pred_full .pred_low .pred_mid +#> <dbl> <dbl> <dbl> +#> 1 0.000000592 0.269 0.731 +#> 2 0.999 0.000733 0.000223 +#> 3 0.999 0.000595 0.000227 +#> 4 0.00000156 0.256 0.744 +#> 5 0.999 0.000705 0.000223 +#> 6 0.000308 0.863 0.137 +#> 7 0.999 0.000595 0.000227 +#> 8 0.0000000651 0.143 0.857 +#> 9 0.00000449 0.343 0.657 +#> 10 0.999 0.000595 0.000227 #> # … with 293 more rows
    # returning the member predictions as well -predict( - class_st, - tree_frogs_class_test, - type = "prob", - members = TRUE -) -
    #> # A tibble: 303 x 12 -#> .pred_full .pred_low .pred_mid .pred_low_class… .pred_low_class… -#> <dbl> <dbl> <dbl> <dbl> <dbl> -#> 1 0.909 0.0558 0.0348 0 0.212 -#> 2 0.123 0.486 0.390 0.715 0.422 -#> 3 0.911 0.0547 0.0340 0 0.212 -#> 4 0.129 0.522 0.349 0.920 0.460 -#> 5 0.911 0.0548 0.0341 0 0.212 -#> 6 0.911 0.0547 0.0341 0 0.212 -#> 7 0.124 0.485 0.390 0.712 0.402 -#> 8 0.106 0.417 0.478 0.314 0.423 -#> 9 0.0921 0.384 0.524 0.116 0.285 -#> 10 0.911 0.0547 0.0341 0 0.212 -#> # … with 293 more rows, and 7 more variables: -#> # .pred_low_class_res_rf_1_01 <dbl>, .pred_mid_class_res_rf_1_02 <dbl>, -#> # .pred_mid_class_res_nn_1_1 <dbl>, .pred_mid_class_res_rf_1_01 <dbl>, -#> # .pred_full_class_res_rf_1_02 <dbl>, .pred_full_class_res_nn_1_1 <dbl>, -#> # .pred_full_class_res_rf_1_01 <dbl>
    # } +predict( + class_st, + tree_frogs_class_test, + type = "prob", + members = TRUE +) +
    #> # A tibble: 303 x 27 +#> .pred_full .pred_low .pred_mid .pred_low_class_res_r… .pred_low_class_res_… +#> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 0.000000592 0.269 0.731 0.468 0.313 +#> 2 0.999 0.000733 0.000223 0.102 0.000619 +#> 3 0.999 0.000595 0.000227 0.0458 0.0000355 +#> 4 0.00000156 0.256 0.744 0.509 0.444 +#> 5 0.999 0.000705 0.000223 0.109 0 +#> 6 0.000308 0.863 0.137 0.519 0.603 +#> 7 0.999 0.000595 0.000227 0.0458 0.0000355 +#> 8 0.0000000651 0.143 0.857 0.283 0.167 +#> 9 0.00000449 0.343 0.657 0.509 0.486 +#> 10 0.999 0.000595 0.000227 0.0458 0.0000355 +#> # … with 293 more rows, and 22 more variables: +#> # .pred_low_class_res_rf_1_08 <dbl>, .pred_low_class_res_rf_1_07 <dbl>, +#> # .pred_low_class_res_nn_1_1 <dbl>, .pred_low_class_res_rf_1_10 <dbl>, +#> # .pred_low_class_res_rf_1_05 <dbl>, .pred_low_class_res_rf_1_01 <dbl>, +#> # .pred_mid_class_res_rf_1_04 <dbl>, .pred_mid_class_res_rf_1_02 <dbl>, +#> # .pred_mid_class_res_rf_1_08 <dbl>, .pred_mid_class_res_rf_1_07 <dbl>, +#> # .pred_mid_class_res_nn_1_1 <dbl>, .pred_mid_class_res_rf_1_10 <dbl>, +#> # .pred_mid_class_res_rf_1_05 <dbl>, .pred_mid_class_res_rf_1_01 <dbl>, +#> # .pred_full_class_res_rf_1_04 <dbl>, .pred_full_class_res_rf_1_02 <dbl>, +#> # .pred_full_class_res_rf_1_08 <dbl>, .pred_full_class_res_rf_1_07 <dbl>, +#> # .pred_full_class_res_nn_1_1 <dbl>, .pred_full_class_res_rf_1_10 <dbl>, +#> # .pred_full_class_res_rf_1_05 <dbl>, .pred_full_class_res_rf_1_01 <dbl>
    # }
    diff --git a/docs/reference/prediction_eqn.html b/docs/reference/prediction_eqn.html index 688475e4..b9b7b753 100644 --- a/docs/reference/prediction_eqn.html +++ b/docs/reference/prediction_eqn.html @@ -95,7 +95,7 @@ stacks @@ -103,7 +103,7 @@ @@ -143,7 +148,7 @@
    @@ -151,23 +156,23 @@

    Convert one or more linear predictor to a format used for prediction

    Convert one or more linear predictor to a format used for prediction

    -
    prediction_eqn(x, ...)
    +    
    prediction_eqn(x, ...)
     
     # S3 method for `_lognet`
    -prediction_eqn(x, type = "class", ...)
    +prediction_eqn(x, type = "class", ...)
     
     # S3 method for `_elnet`
    -prediction_eqn(x, type = "numeric", ...)
    +prediction_eqn(x, type = "numeric", ...)
     
     # S3 method for `_multnet`
    -prediction_eqn(x, type = "class", ...)
    +prediction_eqn(x, type = "class", ...)

    Arguments

    - + diff --git a/docs/reference/reexports.html b/docs/reference/reexports.html index f9f990df..4182110b 100644 --- a/docs/reference/reexports.html +++ b/docs/reference/reexports.html @@ -104,7 +104,7 @@ stacks @@ -112,7 +112,7 @@ @@ -152,7 +157,7 @@
    @@ -160,7 +165,7 @@

    Objects exported from other packages

    These objects are imported from other packages. Follow the links below to see their documentation.

    -
    butcher

    axe_call, axe_ctrl, axe_data, axe_env, axe_fitted, butcher

    +
    butcher

    axe_call, axe_ctrl, axe_data, axe_env, axe_fitted, butcher

    dplyr

    %>%

    diff --git a/docs/reference/stack_predict.html b/docs/reference/stack_predict.html index 0d3521ec..82c83ffd 100644 --- a/docs/reference/stack_predict.html +++ b/docs/reference/stack_predict.html @@ -95,7 +95,7 @@ stacks
    @@ -103,7 +103,7 @@ @@ -143,7 +148,7 @@
    @@ -151,22 +156,22 @@

    Convert one or more linear predictor to a format used for prediction

    Convert one or more linear predictor to a format used for prediction

    -
    stack_predict(x, ...)
    +    
    stack_predict(x, ...)
     
     # S3 method for elnet_numeric
    -stack_predict(x, data, ...)
    +stack_predict(x, data, ...)
     
     # S3 method for lognet_class
    -stack_predict(x, data, ...)
    +stack_predict(x, data, ...)
     
     # S3 method for lognet_prob
    -stack_predict(x, data, ...)
    +stack_predict(x, data, ...)
     
     # S3 method for multnet_class
    -stack_predict(x, data, ...)
    +stack_predict(x, data, ...)
     
     # S3 method for multnet_prob
    -stack_predict(x, data, ...)
    +stack_predict(x, data, ...)

    Arguments

    x

    An object that uses a glmnet::glmnet() model and all numeric predictors.

    An object that uses a glmnet::glmnet() model and all numeric predictors.

    ...
    diff --git a/docs/reference/stacks.html b/docs/reference/stacks.html index f967fc57..3ae19ae3 100644 --- a/docs/reference/stacks.html +++ b/docs/reference/stacks.html @@ -105,7 +105,7 @@ stacks @@ -113,7 +113,7 @@ @@ -153,7 +158,7 @@
    @@ -171,7 +176,7 @@

    Initialize a Stack

    and the basics vignette for a detailed walk-through of functionality.

    -
    stacks(...)
    +
    stacks(...)

    Arguments

    diff --git a/docs/reference/stacks_description.html b/docs/reference/stacks_description.html index fc882268..5258b5d0 100644 --- a/docs/reference/stacks_description.html +++ b/docs/reference/stacks_description.html @@ -51,7 +51,7 @@ stacks @@ -108,7 +108,7 @@ @@ -148,13 +153,13 @@

    logo

    -

    Model stacking is an ensembling technique +

    Model stacking is an ensemble technique that involves training a model to combine the outputs of many diverse statistical models, and has been shown to improve predictive performance in a variety of settings. 'stacks' @@ -163,6 +168,15 @@

    stacks: Tidy Model Stacking

    +

    See also

    + +

    Author

    Maintainer: Simon Couch simonpatrickcouch@gmail.com

    diff --git a/docs/reference/tree_frogs.html b/docs/reference/tree_frogs.html index 856728f9..da87b0a5 100644 --- a/docs/reference/tree_frogs.html +++ b/docs/reference/tree_frogs.html @@ -105,7 +105,7 @@ stacks
    @@ -113,7 +113,7 @@ @@ -153,7 +158,7 @@
    @@ -171,7 +176,7 @@

    Tree frog embryo hatching data

    factors inform whether an embryo hatches prematurely or not!

    -
    tree_frogs
    +
    tree_frogs

    Format

    @@ -202,7 +207,9 @@

    FormatSource

    -

    https://www.biorxiv.org/content/10.1101/2020.09.18.304295v1

    +

    Julie Jung et al. (2020) Multimodal mechanosensing enables treefrog +embryos to escape egg-predators. +https://doi.org/10.1242/jeb.236141

    Details

    Note that the data included with the stacks package is not necessarily diff --git a/docs/tidyverse.css b/docs/tidyverse.css index 0a23ebae..17bbedc1 100644 --- a/docs/tidyverse.css +++ b/docs/tidyverse.css @@ -5,7 +5,7 @@ * Licensed under MIT (https://github.com/twbs/bootstrap/blob/master/LICENSE) */ /*! normalize.css v3.0.3 | MIT License | github.com/necolas/normalize.css */ -@import url("https://fonts.googleapis.com/css?family=Source+Code+Pro:300,400,700|Source+Sans+Pro:300,400,700"); +@import url("https://fonts.googleapis.com/css?family=Source+Code+Pro:300,400,500,700|Source+Sans+Pro:300,400,700"); html { font-family: sans-serif; -ms-text-size-adjust: 100%; @@ -1484,7 +1484,7 @@ pre { word-break: break-all; word-wrap: break-word; color: #212121; - background-color: #f5f5f5; + background-color: #fbfbfb !important; border: 1px solid #ccc; border-radius: 3px; } pre code { @@ -1495,6 +1495,12 @@ pre { background-color: transparent; border-radius: 0; } +/* Commented code */ +span.co { + color: #00825d; + font-weight: 500; +} + .pre-scrollable { max-height: 340px; overflow-y: scroll; } diff --git a/man/example_data.Rd b/man/example_data.Rd index 560d9467..aafd3ca9 100644 --- a/man/example_data.Rd +++ b/man/example_data.Rd @@ -36,8 +36,7 @@ An object of class \code{tune_results} (inherits from \code{tbl_df}, \code{tbl}, } \source{ Julie Jung et al. (2020) Multimodal mechanosensing enables treefrog -embryos to escape egg-predators. -\url{https://doi.org/10.1242/jeb.236141} +embryos to escape egg-predators. \doi{10.1242/jeb.236141} } \usage{ reg_res_svm diff --git a/man/tree_frogs.Rd b/man/tree_frogs.Rd index ed90bbd2..fdbe49d4 100644 --- a/man/tree_frogs.Rd +++ b/man/tree_frogs.Rd @@ -31,8 +31,7 @@ in seconds.)} } \source{ Julie Jung et al. (2020) Multimodal mechanosensing enables treefrog -embryos to escape egg-predators. -\url{https://doi.org/10.1242/jeb.236141} +embryos to escape egg-predators. \doi{10.1242/jeb.236141} } \usage{ tree_frogs diff --git a/tests/testthat/helper_data.Rda b/tests/testthat/helper_data.Rda index 7a76e133..abb44b77 100644 Binary files a/tests/testthat/helper_data.Rda and b/tests/testthat/helper_data.Rda differ diff --git a/tests/testthat/out/model_stack_class.txt b/tests/testthat/out/model_stack_class.txt index 88ba7152..69f7056f 100644 --- a/tests/testthat/out/model_stack_class.txt +++ b/tests/testthat/out/model_stack_class.txt @@ -4,7 +4,7 @@ Message: -- A stacked ensemble model ------------------------------------- Message: Out of 20 possible candidate members, the ensemble retained 13. Penalty: 1e-04. -Mixture: 1. +Mixture: . Message: Across the 3 classes, there are an average of 4.33 coefficients per class. @@ -14,16 +14,16 @@ The 10 highest weighted member classes are: # A tibble: 10 x 4 member type weight class - 1 .pred_mid_class_res_rf_1_04 rand_forest 28.2 low - 2 .pred_mid_class_res_rf_1_06 rand_forest 16.0 mid - 3 .pred_mid_class_res_rf_1_01 rand_forest 15.9 mid - 4 .pred_full_class_res_rf_1_05 rand_forest 12.2 full - 5 .pred_mid_class_res_rf_1_10 rand_forest 11.5 mid - 6 .pred_mid_class_res_rf_1_07 rand_forest 11.2 low - 7 .pred_mid_class_res_rf_1_08 rand_forest 7.21 low - 8 .pred_mid_class_res_rf_1_02 rand_forest 6.58 low - 9 .pred_full_class_res_rf_1_04 rand_forest 4.86 low -10 .pred_full_class_res_rf_1_10 rand_forest 4.77 mid + 1 .pred_mid_class_res_rf_1_04 rand_forest 27.5 low + 2 .pred_mid_class_res_rf_1_01 rand_forest 16.1 mid + 3 .pred_mid_class_res_rf_1_06 rand_forest 15.2 mid + 4 .pred_full_class_res_rf_1_05 rand_forest 11.9 full + 5 .pred_mid_class_res_rf_1_07 rand_forest 11.6 low + 6 .pred_mid_class_res_rf_1_10 rand_forest 11.0 mid + 7 .pred_mid_class_res_rf_1_08 rand_forest 7.20 low + 8 .pred_mid_class_res_rf_1_02 rand_forest 5.92 low + 9 .pred_full_class_res_rf_1_04 rand_forest 4.40 low +10 .pred_full_class_res_rf_1_10 rand_forest 4.17 mid Message: Members have not yet been fitted with `fit_members()`. diff --git a/tests/testthat/out/model_stack_class_fit.txt b/tests/testthat/out/model_stack_class_fit.txt index e1e8a062..4953923d 100644 --- a/tests/testthat/out/model_stack_class_fit.txt +++ b/tests/testthat/out/model_stack_class_fit.txt @@ -4,7 +4,7 @@ Message: -- A stacked ensemble model ------------------------------------- Message: Out of 20 possible candidate members, the ensemble retained 13. Penalty: 1e-04. -Mixture: 1. +Mixture: . Message: Across the 3 classes, there are an average of 4.33 coefficients per class. @@ -14,14 +14,14 @@ The 10 highest weighted member classes are: # A tibble: 10 x 4 member type weight class - 1 .pred_mid_class_res_rf_1_04 rand_forest 28.2 low - 2 .pred_mid_class_res_rf_1_06 rand_forest 16.0 mid - 3 .pred_mid_class_res_rf_1_01 rand_forest 15.9 mid - 4 .pred_full_class_res_rf_1_05 rand_forest 12.2 full - 5 .pred_mid_class_res_rf_1_10 rand_forest 11.5 mid - 6 .pred_mid_class_res_rf_1_07 rand_forest 11.2 low - 7 .pred_mid_class_res_rf_1_08 rand_forest 7.21 low - 8 .pred_mid_class_res_rf_1_02 rand_forest 6.58 low - 9 .pred_full_class_res_rf_1_04 rand_forest 4.86 low -10 .pred_full_class_res_rf_1_10 rand_forest 4.77 mid + 1 .pred_mid_class_res_rf_1_04 rand_forest 27.5 low + 2 .pred_mid_class_res_rf_1_01 rand_forest 16.1 mid + 3 .pred_mid_class_res_rf_1_06 rand_forest 15.2 mid + 4 .pred_full_class_res_rf_1_05 rand_forest 11.9 full + 5 .pred_mid_class_res_rf_1_07 rand_forest 11.6 low + 6 .pred_mid_class_res_rf_1_10 rand_forest 11.0 mid + 7 .pred_mid_class_res_rf_1_08 rand_forest 7.20 low + 8 .pred_mid_class_res_rf_1_02 rand_forest 5.92 low + 9 .pred_full_class_res_rf_1_04 rand_forest 4.40 low +10 .pred_full_class_res_rf_1_10 rand_forest 4.17 mid diff --git a/tests/testthat/out/model_stack_log.txt b/tests/testthat/out/model_stack_log.txt index ae13b3b6..3dcd3e43 100644 --- a/tests/testthat/out/model_stack_log.txt +++ b/tests/testthat/out/model_stack_log.txt @@ -4,7 +4,7 @@ Message: -- A stacked ensemble model ------------------------------------- Message: Out of 10 possible candidate members, the ensemble retained 4. Penalty: 1e-06. -Mixture: 1. +Mixture: . Message: The 4 highest weighted member classes are: @@ -12,10 +12,10 @@ The 4 highest weighted member classes are: # A tibble: 4 x 3 member type weight -1 .pred_yes_log_res_rf_1_09 rand_forest 4.65 -2 .pred_yes_log_res_rf_1_03 rand_forest 1.22 -3 .pred_yes_log_res_rf_1_06 rand_forest 0.689 -4 .pred_yes_log_res_rf_1_05 rand_forest 0.479 +1 .pred_yes_log_res_rf_1_09 rand_forest 4.64 +2 .pred_yes_log_res_rf_1_03 rand_forest 1.25 +3 .pred_yes_log_res_rf_1_06 rand_forest 0.687 +4 .pred_yes_log_res_rf_1_05 rand_forest 0.467 Message: Members have not yet been fitted with `fit_members()`. diff --git a/tests/testthat/out/model_stack_log_fit.txt b/tests/testthat/out/model_stack_log_fit.txt index 72e5ff50..6078ddda 100644 --- a/tests/testthat/out/model_stack_log_fit.txt +++ b/tests/testthat/out/model_stack_log_fit.txt @@ -4,7 +4,7 @@ Message: -- A stacked ensemble model ------------------------------------- Message: Out of 10 possible candidate members, the ensemble retained 4. Penalty: 1e-06. -Mixture: 1. +Mixture: . Message: The 4 highest weighted member classes are: @@ -12,8 +12,8 @@ The 4 highest weighted member classes are: # A tibble: 4 x 3 member type weight -1 .pred_yes_log_res_rf_1_09 rand_forest 4.65 -2 .pred_yes_log_res_rf_1_03 rand_forest 1.22 -3 .pred_yes_log_res_rf_1_06 rand_forest 0.689 -4 .pred_yes_log_res_rf_1_05 rand_forest 0.479 +1 .pred_yes_log_res_rf_1_09 rand_forest 4.64 +2 .pred_yes_log_res_rf_1_03 rand_forest 1.25 +3 .pred_yes_log_res_rf_1_06 rand_forest 0.687 +4 .pred_yes_log_res_rf_1_05 rand_forest 0.467 diff --git a/tests/testthat/out/model_stack_reg.txt b/tests/testthat/out/model_stack_reg.txt index d75d2c5a..cb5d95d9 100644 --- a/tests/testthat/out/model_stack_reg.txt +++ b/tests/testthat/out/model_stack_reg.txt @@ -4,7 +4,7 @@ Message: -- A stacked ensemble model ------------------------------------- Message: Out of 5 possible candidate members, the ensemble retained 2. Penalty: 0.1. -Mixture: 1. +Mixture: . Message: The 2 highest weighted members are: @@ -12,8 +12,8 @@ The 2 highest weighted members are: # A tibble: 2 x 3 member type weight -1 reg_res_svm_1_1 svm_rbf 0.697 -2 reg_res_svm_1_3 svm_rbf 0.499 +1 reg_res_svm_1_3 svm_rbf 1.19 +2 reg_res_svm_1_1 svm_rbf 0.159 Message: Members have not yet been fitted with `fit_members()`. diff --git a/tests/testthat/out/model_stack_reg_fit.txt b/tests/testthat/out/model_stack_reg_fit.txt index 8bd339ce..287dce96 100644 --- a/tests/testthat/out/model_stack_reg_fit.txt +++ b/tests/testthat/out/model_stack_reg_fit.txt @@ -4,7 +4,7 @@ Message: -- A stacked ensemble model ------------------------------------- Message: Out of 5 possible candidate members, the ensemble retained 2. Penalty: 0.1. -Mixture: 1. +Mixture: . Message: The 2 highest weighted members are: @@ -12,6 +12,6 @@ The 2 highest weighted members are: # A tibble: 2 x 3 member type weight -1 reg_res_svm_1_1 svm_rbf 0.697 -2 reg_res_svm_1_3 svm_rbf 0.499 +1 reg_res_svm_1_3 svm_rbf 1.19 +2 reg_res_svm_1_1 svm_rbf 0.159