-
-
Notifications
You must be signed in to change notification settings - Fork 84
ppc_error_scatter_avg()
should be able to plot residuals as function of predicted y, not y
#350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening the issue.
That's a good point. However, we can't just plot
Yeah that's right. We're defining a predictive error to be y - y_rep and then computing a summary of the errors. This lets us compute any summary of the full distribution of predictive errors. This seems like the right thing to do to me. @avehtari what do you think about both of these questions? |
Right. My comment uses notation I would prefer the default Thanks! |
I think Andrew's comments on this are relevant. See a recent blog post https://statmodeling.stat.columbia.edu/2025/05/11/plotting-truth-vs-predicted-value/ and section 11.3 of Regression and Other Stories: “A confusing choice: plot residuals vs. predicted values, or residuals vs. observed values?” |
Sorry, I was working too quickly and misread your notation.
Yeah I know what you mean (no pun intended). When initially developing the package many years ago the mean seemed like what users would expect the default to be for functions that computed a statistic, which is why it's always been the default for functions like |
Mean in |
@avehtari I think that chapter in RAOS argues for doing what @kruschke suggests, right? It says
In those figures is the residual computed as y - stat(y_rep) or stat(y - y_rep)? That is, it is y minus a point prediction or it a point summary of the distribution of errors stat(y - y_rep)? I guess if using the mean/median it doesn't matter, but with the |
Good point |
Yes.
As y - stat(y_rep), and y_pred is the same as stat(y_rep). y_rep presents the predictive distribution, y_pred is a point prediction and computed with mean in that example, and residual is y - y_pred. As the mean of the posterior predictive distribution is the most common point prediction, I think mean is still the best default. It is good to add an option to allow computation of the point prediction with some other function. TJ did also make plots with sd, but that is not sensible point prediction and the language of error or residual is then confusing. |
Ok yeah thanks, just double checking.
That sounds reasonable. I've added a comment on #349 reflecting where we ended up with this discussion.
Yeah this makes complete sense. I've been trying to get too many things done today and moving too quickly and not stopping to think enough apparently! |
Uh oh!
There was an error while loading. Please reload this page.
(Notation below:$y$ is data value, $y_{pred}$ is predicted value, with $y_{pred}$ computed as $stat(y_{rep})$ from posterior draws.)
The usual residual analysis plots$y - y_{pred}$ on the vertical axis with $y_{pred}$ on the horizontal axis.
But$y - y_{pred}$ on the vertical axis with $y$ on the horizontal axis. This is confusing and difficult to interpret (for me, but I'm not alone: https://stats.stackexchange.com/a/146002).
ppc_error_scatter_avg()
plotsI would be great if$y_{pred}$ on the horizontal axis, either by default or with that as an option. (Or maybe there's already an easy way to do that; sorry if I missed it.)
ppc_error_scatter_avg()
plottedThanks for considering!
P.S. My comment here assumes that$y_{pred}$ is computed as $stat(y_{rep})$ from posterior draws. But the related thread regarding residuals #349 seems to suggest that residuals are computed as $stat(y - y_{rep})$ , with $stat(y_{rep})$ not separately, explicitly computed. Hmmm...?
The text was updated successfully, but these errors were encountered: