Skip to content

PPC Calibration plots #352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

PPC Calibration plots #352

wants to merge 5 commits into from

Conversation

TeemuSailynoja
Copy link
Collaborator

@TeemuSailynoja TeemuSailynoja commented May 19, 2025

This is my work in progress of the pava calibration plots discussed in #343

Currently implemented:

  • ppc_calibration_overlay()
  • ppc_calibration_overlay_grouped()
  • ppc_calibration()
  • ppc_calibration_grouped()
  • .ppc_calibration_data() - internal function

Needs:

  • Fast example to test functions.
  • Fix intervals in ppc_calibration()
  • Also example use in documentation
  • LOO versions
  • Should .ppc_calibration_data() be exposed to users?
  • tests
  • check that the input parameter names and default values make sense and are intuitive
  • Add documentation and comments to the code also.

@codecov-commenter
Copy link

codecov-commenter commented May 19, 2025

Codecov Report

Attention: Patch coverage is 0% with 134 lines in your changes missing coverage. Please review.

Project coverage is 96.28%. Comparing base (f8fab2f) to head (f9806eb).
Report is 18 commits behind head on master.

Files with missing lines Patch % Lines
R/ppc-calibration.R 0.00% 134 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #352      +/-   ##
==========================================
- Coverage   98.60%   96.28%   -2.33%     
==========================================
  Files          35       36       +1     
  Lines        5533     5673     +140     
==========================================
+ Hits         5456     5462       +6     
- Misses         77      211     +134     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TeemuSailynoja
Copy link
Collaborator Author

Examples

These should allow for some tests of these functions.

Creating example data

library(bayesplot)
ymin <- range(example_y_data(), example_yrep_draws())[1]
ymax <- range(example_y_data(), example_yrep_draws())[2]
# Observations and posterior predictive probabilitites.
y <- rbinom(length(example_y_data()), 1, (example_y_data() - ymin) / (ymax - ymin))
prep <- (example_yrep_draws() - ymin) / (ymax - ymin)
groups <- example_group_data()

PAVA Calibration overlay

Basic

ppc_calibration_overlay(y, prep[1:50,])

image

Grouped

ppc_calibration_overlay_grouped(y, prep[1:50,], groups)

image

PAVA Calibration

This isn't yet quite what we want. Now the interval is not what we show in the paper. There, we use consistency intervals, that is, intervals centered at the diagonal displaying, where the calibration curve should lie, i.e. the posterior mean should stay within these bounds.
In this implementation, I'm plotting a confidence interval, which shows, where we think the curve lies, i.e. the diagonal should be included.

ppc_calibration(y, prep)

image

ppc_calibration_grouped(y, prep, groups)

image

Copy link
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all sounds good, thanks @TeemuSailynoja. I made a few small review comments/questions. In addition to those questions, when you say

This isn't yet quite what we want. Now the interval is not what we show in the paper.

you mean that we will want to change this to use the consistency intervals you use in the paper, right? Do you think it's at all useful to give the user the option to choose which kind of interval? Or just strictly better to use the consistency intervals? I hadn't really thought about that.

Comment on lines +203 to +209
if (requireNamespace("monotone", quietly = TRUE)) {
monotone <- monotone::monotone
} else {
monotone <- function(y) {
stats::isoreg(y)$yf
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an advantage to using monotone::monotone instead of stats::isoreg?

Copy link
Member

@jgabry jgabry May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, does it do something slightly better? Or the same thing more efficiently? I've seen stats::isoreg before but I had never seen the monotone package. If there's no difference then it's probably not worth checking for the monotone package. If it's better then we could put monotone in Suggests and then check for it like you do here.

#' @rdname PPC-calibration
#' @export
ppc_calibration_overlay <- function(
y, prep, ..., linewidth = 0.25, alpha = 0.5) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for these functions prep is a matrix of probabilities and not actually a matrix of draws of binary outcomes from the posterior predictive distribution, right? I think in that case the argument name prep makes sense. But the description at the top of the file says

Assess the calibration of the predictive distributions yrep in relation to the data `y'

which makes it sound like the user should give us yrep. So I think we just need to reconcile how we describe this to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants