-
-
Notifications
You must be signed in to change notification settings - Fork 84
add fun_avg to ppc_avg functions #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #349 +/- ##
==========================================
- Coverage 98.60% 98.58% -0.03%
==========================================
Files 35 35
Lines 5539 5589 +50
==========================================
+ Hits 5462 5510 +48
- Misses 77 79 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fast reaction. This looks like a clean implementation to address the listed issue. My only comment is about the documentation, which could mention the expected format of fun_avg
.
R/ppc-errors.R
Outdated
#' @param fun_avg Function to apply to compute the posterior average. | ||
#' Defaults to `"mean"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ppc_stat
functions, we have a similar argument, by the name stat
, which does a very similar job:
#' @param stat A single function or a string naming a function, except for the
#' 2D plot which requires a vector of exactly two names or functions. In all
#' cases the function(s) should take a vector input and return a scalar
#' statistic. If specified as a string (or strings) then the legend will
#' display the function name(s). If specified as a function (or functions)
#' then generic naming is used in the legend.
We could align this doc to read, for example:
#' @param fun_avg A function or a string naming a function for computing the
#' posterior average. In both cases, the function should take a vector input and
#' return a scalar statistic. If specified as a string, then the legend will
#' display the function name. If specified as a function
#' then generic naming is used in the legend.
#' Defaults to"mean"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Average y - y_rep
axis label is not affected. I didn't want to make yrep_avg_label()
and error_avg_label()
depend on fun_avg
or change the default "Average y - y_rep" labels.
It does affect the $rep_label
in ppc_scatter_avg_data(y, yrep)
when fun_avg is a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, i'm just noticing Aki's comment. I'll switch to stat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tjmahr. The code looks good. And I agree with changing to stat
R/ppc-scatterplots.R
Outdated
#' @param fun_avg Function to apply to compute the posterior average. | ||
#' Defaults to `"mean"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
I was also thinking It would be good to change the y-axis label to show the |
Following up on @avehtari , also would be good to change the x-axis label to show the |
Because the function name is used for both labeling and invoking a function name, I added library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#> * Does _not_ affect other ggplot2 plots
#> * See ?bayesplot_theme_set for details on theme setting
l <- jsonlite::read_json(
"https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json",
simplifyVector = TRUE
)
y <- l$y
yrep <- l$yrep
x <- l$x stat is inlined into the axis label ppc_scatter_avg(y, yrep, stat = "mean") stat can be a function ppc_scatter_avg(y, yrep, stat = median) or a symbol ppc_scatter_avg(y, yrep, stat = quote(median)) stat can be a primitive function ppc_error_scatter_avg_vs_x(y, yrep, x, stat = min) stat can be an anonymous function, but it gets a generic label ppc_scatter_avg(y, yrep, stat = function(x) quantile(x, .1)) Created on 2025-05-14 with reprex v2.1.1 |
If I'm not wrong, this is more challenging, as when passing |
No, you can capture that. ( library(rlang)
library(ggplot2)
f <- function(x, y) {
qx <- enquo(x)
qy <- enquo(y)
data <- data.frame(x = x, y = y)
ggplot(data) +
aes(x, y) +
geom_point() +
labs(
x = as_label(quo_get_expr(qx)),
y = as_label(quo_get_expr(qy))
)
}
data <- data.frame(
y = bayesplot::example_y_data(),
x = bayesplot::example_x_data()
)
f(data$x, data$y) Created on 2025-05-15 with reprex v2.1.1 The hard part for us is that when you call My update yesterday kept track of the expression for Edit: I figured out how to handle the inner function problem by using quosures/tunnelling. |
library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#> * Does _not_ affect other ggplot2 plots
#> * See ?bayesplot_theme_set for details on theme setting
l <- jsonlite::read_json(
"https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json",
simplifyVector = TRUE
)
y <- l$y
yrep <- l$yrep
x <- l$x Function name ppc_scatter_avg(y, yrep, stat = "mean") ppc_error_scatter_avg(y, yrep, stat = "mean") Function object ppc_scatter_avg(y, yrep, stat = median) ppc_error_scatter_avg(y, yrep, stat = median) Primitive function ppc_scatter_avg(y, yrep, x, stat = min) ppc_error_scatter_avg_vs_x(y, yrep, x, stat = min) Anonymous function ppc_scatter_avg(y, yrep, x, stat = function(x) quantile(x, .1)) ppc_error_scatter_avg_vs_x(y, yrep, x, stat = function(x) quantile(x, .1)) Anonymous function (formulas) ppc_scatter_avg(y, yrep, x, stat = ~ quantile(.x, .1)) ppc_error_scatter_avg_vs_x(y, yrep, x, stat = ~ quantile(.x, .1)) Created on 2025-05-15 with reprex v2.1.1 |
- avoid global for y, italic
This patch should be ready for review. The expression for library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#> * Does _not_ affect other ggplot2 plots
#> * See ?bayesplot_theme_set for details on theme setting
l <- jsonlite::read_json(
"https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json",
simplifyVector = TRUE
)
ppc_error_scatter_avg_vs_x(l$y, l$yrep, l$x, stat = "sd") Created on 2025-05-16 with reprex v2.1.1 |
stat(y - y_rep) is what is computed internally. |
@tjmahr sorry for the delay in reviewing this. Been a busy few days. Will try to get to it soon! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I made a few minor comments/questions. Aside from those my only other question is does it makes sense to also use this new approach with the stat
argument to ppc_stat()
? You could do something similar to what you've done with the axis labels here but for the legend for ppc_stat()
. What do you think?
@@ -26,7 +26,7 @@ URL: https://mc-stan.org/bayesplot/ | |||
BugReports: https://github.com/stan-dev/bayesplot/issues/ | |||
SystemRequirements: pandoc (>= 1.12.3), pandoc-citeproc | |||
Depends: | |||
R (>= 3.1.0) | |||
R (>= 4.1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at this point it's been out long enough that bumping the required R version is fine.
#' containing the expression to represent the function name and an | ||
#' `"is_anonymous_function"` attribute to flag if the expression is a call to | ||
#' `function()`. | ||
as_tagged_function <- function(f = NULL, fallback = "func") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function works as described, just one question: how come sometimes you call rlang functions with rlang::foo()
and other times just foo()
? I think either way works since we have @import rlang
in bayesplot-package.R
, just curious if your choices were intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Probably an artifact of me using typing rlang::[TAB]
to use autocomplete to find the right rlang function.
#' draws (rows). | ||
#' the points `(x,y) = (average(yrep[, n]), y[n])`, where each `yrep[, n]` is | ||
#' a vector of length equal to the number of posterior draws and `average()` | ||
#' is summary statistic. Unlike for `ppc_scatter()`, for `ppc_scatter_avg()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' is summary statistic. Unlike for `ppc_scatter()`, for `ppc_scatter_avg()` | |
#' is a summary statistic. Unlike for `ppc_scatter()`, for `ppc_scatter_avg()` |
levels(data$rep_label) <- yrep_avg_label(stat) |> | ||
as.expression() |> | ||
as.character() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me to use the base R pipe, but we do import %>%
from dplyr so could also use that.
@tjmahr Also see @avehtari's comment here #350 (comment), which supports @kruschke's suggestion. When we were always using the mean it didn't matter if we were computing Then there's the separate question of what to plot on x and y axes for |
My thinking was stat(y - y_rep) is an average error, and y - stat(y_rep) is the error of the average prediction, so the former better matches the function names and how I think about ppc functions. I don't oppose the change, and updating the axis label will make it unambiguous what's being computed. |
Yeah I have mixed feelings about this. Both feel intuitive to me but represent slightly different things. A couple of options (there are probably others):
Do you have a preference? Or a 3rd option? |
I think it would be better to make a new function, to not break anyone's current plots. I was also thinking of name something like |
Following #348, allow user to set the averaging function so that, e.g., user can choose median for heavy-tailed distributions.
Created on 2025-05-13 with reprex v2.1.1