-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to multimodal HDI #28
Conversation
More modular functions and vectorization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the improvements, thanks!
Completely unrelated but it would also be interesting to know how hard it was to navigate the codebase in the current stats of docs tending to zero. Both so we can prioritize which docs and tests to write first and maybe rethink some design choices.
Oh one thing I left out but will add is the ability to restrict the bounds to points actually in the sample. There are 2 ways to do this, via trimming or interpolation. As I shared on Slack, experiments show that interpolation produces better estimates than trimming and for moderate to large sample sizes (n>O(100)) is better than KDE-based estimates. The latter makes me wonder if this should be the default, but I hesitate mostly because I haven't seen it discussed in the litrature. I'm not certain what keyword to use to allow the user to configure this behavior. BTW, what are the names |
IIRC, there were two implementations of hdi in regular arviz. One that used the raw samples, looking at how many |
As far as I can tell from reading the regular arviz code, the two methods that are supported are 1) the same as "nearest" here and 2) the same as "multimodal" before this PR. "agg_nearest" does not seem to be supported. I was more wondering why these names were chosen. e.g. what is "nearest" near to? The original draws? If so then I think "contiguous" is a more accurate name, since that's really the constraint applied by this method.
import numpy as np
import numba
@numba.jit
def hdi_contiguous_weighted(bins, bin_probs, prob):
n = len(bins)
is_discrete = bins.dtype.kind != 'f'
cum_probs = np.cumsum(bin_probs)
bins_diff = np.diff(bins)
i_lower = 0
i_upper = np.searchsorted(cum_probs, prob, side="left")
interval_width = bins[i_upper] - bins[i_lower] + is_discrete
min_interval_width = interval_width
interval_prob = cum_probs[i_upper]
interval = np.array([i_lower, i_upper])
while i_upper < n - 1:
# increase lower bound until interval is invalid
while interval_prob >= prob and i_lower <= i_upper:
if interval_width < min_interval_width:
interval[:] = (i_lower, i_upper)
min_interval_width = interval_width
interval_prob -= bin_probs[i_lower]
interval_width -= bins_diff[i_lower]
i_lower += 1
# increase upper bound until interval is valid again
while interval_prob < prob and i_upper < n - 1:
interval_width += bins_diff[i_upper]
i_upper += 1
interval_prob += bin_probs[i_upper]
return bins[interval] |
Let's remove |
I think the only thing left for merging is the api and behaviour for |
I've added |
I think |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed method and opened an issue for the unimodal version so we can merge
Thanks, @OriolAbril! |
This PR implements the improvements to HDI suggested in arviz-devs/arviz#2394, with a few differences:
max_modes
are computed, now themax_modes
highest probability intervals are returned instead of just the ones that are lowest on the real line.📚 Documentation preview 📚: https://arviz-stats--28.org.readthedocs.build/en/28/