Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rank_results() bug: Warning: Unknown or uninitialised column: result. AND(!?)✖ Column metric is not found. #136

Closed
PathosEthosLogos opened this issue Dec 26, 2023 · 3 comments · Fixed by #137

Comments

@PathosEthosLogos
Copy link

suppressPackageStartupMessages(suppressWarnings({
  library(tidytable)
  library(tidymodels)
  
  conflicted::conflict_prefer_all(winner = "tidytable",
                                  quiet = TRUE)
}))

# Create training data with a bit of preprocessing included
df_train = attenu |> 
  select(-station)

# Set up model formula
rec_formula = df_train |> 
  recipe(dist ~ .) |> 
  prep()

# Set up cross validation for time series
resamples = df_train |> 
  sliding_index(event,
                lookback = 3)

# Set hyperparameter tuning algorithm specification
spec_enet = linear_reg(penalty = tune(), mixture = tune()) |> 
  set_engine('glmnet')

# For performance metrics
metrics = metric_set(huber_loss, rmse, mae, smape)

# Put the model steps together
wf_enet = workflow(preprocessor = rec_formula,
                   spec = spec_enet)

# Run the models and time it
system.time({
  tuned_model = wf_enet |> 
    tune_grid(grid = 5,
              resamples = resamples,
              metrics = metrics,
              control = control_resamples(save_pred = T))
})
#>    user  system elapsed 
#>   42.41    0.61   43.03

# Pick the best model
best_models = rank_results(tuned_model,
                           rank_metric = huber_loss,
                           select_best = T)
#> Warning: Unknown or uninitialised column: `result`.
#> Error in `dplyr::group_by()`:
#> ! Must group by variables found in `.data`.
#> ✖ Column `metric` is not found.
#> Backtrace:
#>     ▆
#>  1. ├─workflowsets::rank_results(...)
#>  2. │ └─workflowsets:::pick_metric(x, rank_metric)
#>  3. │   └─workflowsets:::collate_metrics(x)
#>  4. │     └─metrics %>% dplyr::group_by(metric) %>% ...
#>  5. ├─dplyr::summarize(...)
#>  6. ├─dplyr::group_by(., metric)
#>  7. └─dplyr:::group_by.data.frame(., metric)
#>  8.   └─dplyr::group_by_prepare(.data, ..., .add = .add, error_call = current_env())
#>  9.     └─rlang::abort(bullets, call = error_call)

Created on 2023-12-26 with reprex v2.0.2

@PathosEthosLogos
Copy link
Author

I was digging into it a bit. There are two places where rank_results() calls for the column result after running tune_grid().

First, it runs into problem at collate_metrics() in source misc.R.

Then it calls for the column twice again in rank_results()

types <- x %>% full_join(wflow_info, by = "wflow_id") %>% 
    mutate(is_race = map_lgl(**result**, ~inherits(.x,
                                               "tune_race")), num_rs = map_int(**result**x, get_num_resamples)) %>% 
    select(wflow_id, is_race, num_rs)
  ranked <- full_join(results, types, by = "wflow_id") %>% 
    filter(.metric == metric)

It seems that tune_grid() should be creating the column result but it does not. Was there some new update that changed this? It has been working.

@simonpcouch
Copy link
Contributor

Thanks for the issue @PathosEthosLogos!

tune_grid() doesn't create a column called result and hasn't before; workflow_map() creates that column, and its entries are the outputs of each call to tune_grid().

The issue you're seeing here is that rank_results() is a function defined for workflow sets and doesn't know what to do with a tuning result like tuned_model. In workflowsets, we should better check inputs and error more informatively there.

If you want output similar to rank_results() output for tuning results, you could try select_best() or collect_metrics()!

@simonpcouch
Copy link
Contributor

Related to #131.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants