add fun_avg to ppc_avg functions #349

tjmahr · 2025-05-13T20:10:34Z

Following #348, allow user to set the averaging function so that, e.g., user can choose median for heavy-tailed distributions.

l <- jsonlite::read_json(
  "https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json", 
  simplifyVector = TRUE
)

# Defaults to mean
bayesplot::ppc_error_scatter_avg_vs_x(l$y, l$yrep, l$x) +
  ggplot2::expand_limits(x = c(-10, 10))

bayesplot::ppc_error_scatter_avg_vs_x(l$y, l$yrep, l$x, fun_avg = median) + 
  ggplot2::expand_limits(x = c(-10, 10))

^{Created on 2025-05-13 with reprex v2.1.1}

codecov-commenter · 2025-05-13T20:17:09Z

Codecov Report

Attention: Patch coverage is 97.26027% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.60%. Comparing base (95a23b7) to head (9cb807c).

Files with missing lines	Patch %	Lines
R/ppc-errors.R	97.14%	1 Missing ⚠️
R/ppc-scatterplots.R	93.75%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
- Coverage   98.66%   98.60%   -0.06%     
==========================================
  Files          35       35              
  Lines        5600     5650      +50     
==========================================
+ Hits         5525     5571      +46     
- Misses         75       79       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TeemuSailynoja

Thank you for the fast reaction. This looks like a clean implementation to address the listed issue. My only comment is about the documentation, which could mention the expected format of fun_avg.

TeemuSailynoja · 2025-05-13T21:36:57Z

R/ppc-errors.R

+#' @param fun_avg Function to apply to compute the posterior average.
+#'   Defaults to `"mean"`.


In the ppc_stat functions, we have a similar argument, by the name stat, which does a very similar job:

#' @param stat A single function or a string naming a function, except for the
#' 2D plot which requires a vector of exactly two names or functions. In all
#' cases the function(s) should take a vector input and return a scalar
#' statistic. If specified as a string (or strings) then the legend will
#' display the function name(s). If specified as a function (or functions)
#' then generic naming is used in the legend.

We could align this doc to read, for example:

#' @param fun_avg A function or a string naming a function for computing the
#' posterior average. In both cases, the function should take a vector input and
#' return a scalar statistic. If specified as a string, then the legend will
#' display the function name. If specified as a function
#' then generic naming is used in the legend.
#' Defaults to "mean".

The Average y - y_rep axis label is not affected. I didn't want to make yrep_avg_label() and error_avg_label() depend on fun_avg or change the default "Average y - y_rep" labels.

It does affect the $rep_label in ppc_scatter_avg_data(y, yrep) when fun_avg is a string.

wait, i'm just noticing Aki's comment. I'll switch to stat.

Thanks @tjmahr. The code looks good. And I agree with changing to stat

TeemuSailynoja · 2025-05-13T21:39:38Z

R/ppc-scatterplots.R

+#' @param fun_avg Function to apply to compute the posterior average.
+#'   Defaults to `"mean"`.


same as above.

avehtari · 2025-05-14T06:52:17Z

I was also thinking stat instead of fun_avg.

It would be good to change the y-axis label to show the stat which was used, and also add parentheses. For example:
"median(y - y_rep)" or "y - median(y_rep)"

kruschke · 2025-05-14T16:30:01Z

Following up on @avehtari , also would be good to change the x-axis label to show the x variable name instead of just generic "$x$".

tjmahr · 2025-05-14T20:23:03Z

I have changed fun_avg to have the name stat.
The stat name is shown on the axis label.
I have made the stat argument handle functions given by strings, function objects, anonymous function objects, and quoted functions names.

Because the function name is used for both labeling and invoking a function name, I added as_tagged_function() that keeps track of the expression provided by the user.

library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting

l <- jsonlite::read_json(
  "https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json", 
  simplifyVector = TRUE
)
y <- l$y 
yrep <- l$yrep 
x <- l$x

stat is inlined into the axis label

ppc_scatter_avg(y, yrep, stat = "mean")

stat can be a function

ppc_scatter_avg(y, yrep, stat = median)

or a symbol

ppc_scatter_avg(y, yrep, stat = quote(median))

stat can be a primitive function

ppc_error_scatter_avg_vs_x(y, yrep, x, stat = min)

stat can be an anonymous function, but it gets a generic label

ppc_scatter_avg(y, yrep, stat = function(x) quantile(x, .1))

^{Created on 2025-05-14 with reprex v2.1.1}

TeemuSailynoja · 2025-05-15T08:37:39Z

Following up on @avehtari , also would be good to change the x-axis label to show the x variable name instead of just generic "$x$".

If I'm not wrong, this is more challenging, as when passing x = data_frame$col_name, the information of the name of the column is lost, and only a numeric vector is passed to x.

tjmahr · 2025-05-15T19:22:33Z

If I'm not wrong, this is more challenging, as when passing x = data_frame$col_name, the information of the name of the column is lost, and only a numeric vector is passed to x.

No, you can capture that. (plot() does that.) It's really easy in a single function setup.

library(rlang)
library(ggplot2)
f <- function(x, y) {
  qx <- enquo(x)
  qy <- enquo(y)
  data <- data.frame(x = x, y = y)
  ggplot(data) + 
    aes(x, y) + 
    geom_point() + 
    labs(
      x = as_label(quo_get_expr(qx)),
      y = as_label(quo_get_expr(qy))
    )
}

data <- data.frame(
  y = bayesplot::example_y_data(),
  x = bayesplot::example_x_data()
)
f(data$x, data$y)

^{Created on 2025-05-15 with reprex v2.1.1}

The hard part for us is that when you call f(x = my_x, y = my_y) which internally calls g(x, y), then g() loses access to the user expression unless you take extra care along the way. (Or use quosures. I'm not sure yet.) We do that when we modify data and call some internal pp_() function to render the plot.

My update yesterday kept track of the expression for stat along the way but it was pretty tricky.

Edit: I figured out how to handle the inner function problem by using quosures/tunnelling.

tjmahr · 2025-05-15T20:49:55Z

Label anonymous functions with "stat"
Support ~ style anonymous functions
Simplify as_tagged_function() by using quosures

library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting

l <- jsonlite::read_json(
  "https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json", 
  simplifyVector = TRUE
)
y <- l$y 
yrep <- l$yrep 
x <- l$x

Function name

ppc_scatter_avg(y, yrep, stat = "mean")

ppc_error_scatter_avg(y, yrep, stat = "mean")

Function object

ppc_scatter_avg(y, yrep, stat = median)

ppc_error_scatter_avg(y, yrep, stat = median)

Primitive function

ppc_scatter_avg(y, yrep, x, stat = min)

ppc_error_scatter_avg_vs_x(y, yrep, x, stat = min)

Anonymous function

ppc_scatter_avg(y, yrep, x, stat = function(x) quantile(x, .1))

ppc_error_scatter_avg_vs_x(y, yrep, x, stat = function(x) quantile(x, .1))

Anonymous function (formulas)

ppc_scatter_avg(y, yrep, x, stat = ~ quantile(.x, .1))

ppc_error_scatter_avg_vs_x(y, yrep, x, stat = ~ quantile(.x, .1))

^{Created on 2025-05-15 with reprex v2.1.1}

- avoid global for y, italic

tjmahr · 2025-05-16T18:01:04Z

This patch should be ready for review.

The expression for x now appears in the axis label.

library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting
l <- jsonlite::read_json(
  "https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json", 
  simplifyVector = TRUE
)

ppc_error_scatter_avg_vs_x(l$y, l$yrep, l$x, stat = "sd")

^{Created on 2025-05-16 with reprex v2.1.1}

tjmahr · 2025-05-19T14:04:04Z

@kruschke:

But the related thread regarding residuals #349 seems to suggest that residuals are computed as s t a t ( y − y r e p ) , with s t a t ( y r e p ) not separately, explicitly computed. Hmmm...?

Sorry if I'm late to the party, but shouldn't the residual label be $y - stat(y_{rep})$, not $stat(y - y_{rep})$? Isn't $y - stat(y_{rep})$ being computed internally, not $stat(y - y_{rep})$?

stat(y - y_rep) is what is computed internally.

jgabry · 2025-05-20T21:51:48Z

@tjmahr sorry for the delay in reviewing this. Been a busy few days. Will try to get to it soon!

jgabry

This looks good! I made a few minor comments/questions. Aside from those my only other question is does it makes sense to also use this new approach with the stat argument to ppc_stat()? You could do something similar to what you've done with the axis labels here but for the legend for ppc_stat(). What do you think?

jgabry · 2025-05-22T16:18:35Z

DESCRIPTION

@@ -26,7 +26,7 @@ URL: https://mc-stan.org/bayesplot/
 BugReports: https://github.com/stan-dev/bayesplot/issues/
 SystemRequirements: pandoc (>= 1.12.3), pandoc-citeproc
 Depends:
-    R (>= 3.1.0)
+    R (>= 4.1.0)


I think at this point it's been out long enough that bumping the required R version is fine.

R/bayesplot-helpers.R

R/ppc-scatterplots.R

jgabry · 2025-05-22T19:14:22Z

@tjmahr Also see @avehtari's comment here #350 (comment), which supports @kruschke's suggestion.

When we were always using the mean it didn't matter if we were computing y - stat(y_rep) or stat(y - y_rep). But if we're going to allow arbitrary functions then the two quantities won't necessarily be the same and users will likely expect us to be computing y - stat(y_rep), i.e. y minus a point prediction.

Then there's the separate question of what to plot on x and y axes for ppc_error_scatter_avg. @kruschke is right that standard residual plots would put y - stat(y_rep) on the y-axis and stat(y_rep) on the x-axis. If we switch to doing that do you think that's too big of a change? That is, if we do that should we change the name of the function and deprecate the old version? Some other option?

tjmahr · 2025-05-22T20:43:32Z

But if we're going to allow arbitrary functions then the two quantities won't necessarily be the same and users will likely expect us to be computing y - stat(y_rep), i.e. y minus a point prediction.

My thinking was stat(y - y_rep) is an average error, and y - stat(y_rep) is the error of the average prediction, so the former better matches the function names and how I think about ppc functions.

I don't oppose the change, and updating the axis label will make it unambiguous what's being computed.

jgabry · 2025-05-22T21:43:17Z

Yeah I have mixed feelings about this. Both feel intuitive to me but represent slightly different things.

A couple of options (there are probably others):

Change ppc_error_scatter_avg to do y - stat(y_rep) on the y-axis and stat(y_rep) on the x-axis. This is what @kruschke was expecting and @avehtari agreed with, so it seems like many users would probably expect this.
Keep ppc_error_scatter_avg as you have it and add a new one that does y - stat(y_rep) on the y-axis and stat(y_rep) on the x-axis. We could call it ppc_residual() or something like that. I think with the right documentation having both would probably be ok.

Do you have a preference? Or a 3rd option?

avehtari · 2025-05-23T06:31:05Z

I think it would be better to make a new function, to not break anyone's current plots. I was also thinking of name something like ppc_residual(). @TeemuSailynoja has also new binned residual plot with PAVA, which could then be ppc_residual_binned() and we would not change the current ppc_error_binned()

tjmahr · 2025-07-02T17:10:06Z

Keep ppc_error_scatter_avg as you have it and add a new one that does y - stat(y_rep) on the y-axis and stat(y_rep) on the x-axis. We could call it ppc_residual() or something like that. I think with the right documentation having both would probably be ok.

Okay, that seems to be the plan now in #348.

Merge remote-tracking branch 'origin/master' into fun-avg # Conflicts: # NEWS.md

tjmahr · 2025-07-02T17:57:32Z

Here's the hopefully final set of test plots for the stat functionality.

library(bayesplot)
#> This is bayesplot version 1.13.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting

l <- jsonlite::read_json(
  "https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json", 
  simplifyVector = TRUE
)
y <- l$y 
yrep <- l$yrep 
x <- l$x

stat is inlined into the axis label

ppc_scatter_avg(y, yrep, stat = "mean")

stat can be a function

ppc_scatter_avg(y, yrep, stat = median)

stat can be a primitive function

ppc_error_scatter_avg_vs_x(y, yrep, x, stat = min)

stat can be an anonymous function, but it gets a generic label

ppc_scatter_avg(y, yrep, stat = function(x) quantile(x, .1))

ppc_scatter_avg(y, yrep, stat = ~quantile(.x, .1))

^{Created on 2025-07-02 with reprex v2.1.1}

tjmahr · 2025-07-02T19:40:43Z

This should be ready to merge.

TeemuSailynoja

This looks very good! Thank you, @tjmahr !

add fun_avg to ppc_avg functions

c96c7d6

tjmahr requested a review from TeemuSailynoja May 13, 2025 20:10

tjmahr mentioned this pull request May 13, 2025

ppc_error_scatter_avg_vs_x, has unstable residuals when the noise distribution has heavy tails; needs median not mean? #348

Closed

TeemuSailynoja reviewed May 13, 2025

View reviewed changes

forward "stat" to axis labels in ppc

ffe723f

fix docs

c647d69

tjmahr marked this pull request as draft May 15, 2025 19:23

tjmahr added 2 commits May 15, 2025 15:29

simplify as_tagged_function()

23865df

use "stat" for anon functions. support formulas

27c17d6

tjmahr added 3 commits May 16, 2025 12:06

use x expression as axis label

2deeca9

bump r version to support native pipe (added 2021)

1acd544

- fixed roxygen warning

4293eb8

- avoid global for y, italic

tjmahr linked an issue May 16, 2025 that may be closed by this pull request

ppc_error_scatter_avg_vs_x, has unstable residuals when the noise distribution has heavy tails; needs median not mean? #348

Closed

tjmahr requested a review from jgabry May 16, 2025 18:02

tjmahr marked this pull request as ready for review May 16, 2025 18:02

kruschke mentioned this pull request May 17, 2025

ppc_error_scatter_avg() should be able to plot residuals as function of predicted y, not y #350

Open

jgabry reviewed May 22, 2025

View reviewed changes

behramulukir mentioned this pull request Jul 2, 2025

Updating residual plots #358

Open

tjmahr added 2 commits July 2, 2025 12:15

merge changes from main tree

0993aa3

Merge remote-tracking branch 'origin/master' into fun-avg # Conflicts: # NEWS.md

final clean up

9cb807c

TeemuSailynoja approved these changes Jul 3, 2025

View reviewed changes

TeemuSailynoja merged commit 527c48c into master Jul 3, 2025
6 checks passed

		#' @param fun_avg Function to apply to compute the posterior average.
		#' Defaults to `"mean"`.

Uh oh!

add fun_avg to ppc_avg functions #349

add fun_avg to ppc_avg functions #349

Uh oh!

Conversation

tjmahr commented May 13, 2025

Uh oh!

codecov-commenter commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TeemuSailynoja left a comment

Choose a reason for hiding this comment

Uh oh!

TeemuSailynoja May 13, 2025

Choose a reason for hiding this comment

Uh oh!

tjmahr May 14, 2025

Choose a reason for hiding this comment

Uh oh!

tjmahr May 14, 2025

Choose a reason for hiding this comment

Uh oh!

jgabry May 14, 2025

Choose a reason for hiding this comment

Uh oh!

TeemuSailynoja May 13, 2025

Choose a reason for hiding this comment

Uh oh!

avehtari commented May 14, 2025

Uh oh!

kruschke commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjmahr commented May 14, 2025

Uh oh!

TeemuSailynoja commented May 15, 2025

Uh oh!

tjmahr commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjmahr commented May 15, 2025

Uh oh!

tjmahr commented May 16, 2025

Uh oh!

tjmahr commented May 19, 2025

Uh oh!

jgabry commented May 20, 2025

Uh oh!

jgabry left a comment

Choose a reason for hiding this comment

Uh oh!

jgabry May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jgabry commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjmahr commented May 22, 2025

Uh oh!

jgabry commented May 22, 2025

Uh oh!

avehtari commented May 23, 2025

Uh oh!

tjmahr commented Jul 2, 2025

Uh oh!

tjmahr commented Jul 2, 2025

Uh oh!

tjmahr commented Jul 2, 2025

Uh oh!

TeemuSailynoja left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented May 13, 2025 •

edited

Loading

kruschke commented May 14, 2025 •

edited

Loading

tjmahr commented May 15, 2025 •

edited

Loading

jgabry commented May 22, 2025 •

edited

Loading