Nicer representations of intervals #25

sethaxen · 2024-02-01T23:30:46Z

As pointed out in arviz-devs/arviz#2306, HDI is different from ETI in that for HDI it doesn't make sense to assign a percentage to the bounds as we currently do. The bounds themselves are not interesting; only the whole interval is interesting. By contrast, we could either represent an ETI with a percentage describing the probability in that interval or with percentages for each bound. The bounds themselves are interesting.

Furthermore, the name ETI is perhaps not so well known. Since horizontal space is precious for showing summary stats in the REPL, I propose we clarify the HDI bound with no additional horizontal space taken up with the following change:

julia> using PosteriorStats

julia> function ar1(φ::Real, σ::Real, n::Int...)
           T = float(Base.promote_eltype(φ, σ))
           x = randn(T, n...)
           x .*= σ
           accumulate!(x, x; dims=1) do xi, ϵ
               return muladd(φ, xi, ϵ)
           end
           return x
       end;

julia> x = ar1(0.5, sqrt(1-0.5^2), 200, 4, 5) .* reshape(1:5, 1, 1, :) .+ reshape(1:5, 1, 1, :);

First, we store the HDI as a tuple and drop the unnecessary % and _ from the column name, producing:

julia> summarize(x)
SummaryStats
    mean   std  hdi94           mcse_mean  mcse_std  ess_tail  ess_bulk  rhat 
 1   1.0  1.01  (-0.678, 3.00)      0.059     0.030       475       293  1.01
 2   2.0  2.0   (-1.45, 6.07)       0.11      0.060       435       308  1.02
 3   2.8  3.0   (-2.80, 8.10)       0.21      0.097       414       220  1.01
 4   3.5  3.9   (-3.04, 11.8)       0.23      0.11        548       296  1.01
 5   4.8  4.8   (-3.91, 14.1)       0.27      0.16        489       331  1.01

Second, we replace eti_XXX% with qXXX, similar to the posterior package. While q is very short, I think it's understandable what these mean, and we could easily see this generalizing to include more than just the 2 quantiles:

julia> summarize(x, default_stats(median)...)
SummaryStats
    median   mad      q3    q97 
 1    1.06  1.07  -0.932   2.90
 2    2.01  1.90  -1.97    5.69
 3    2.76  3.17  -2.90    7.99
 4    3.49  3.91  -3.99   11.0
 5    4.66  4.62  -4.07   13.9

The text was updated successfully, but these errors were encountered:

sethaxen · 2024-02-11T19:29:32Z

@devmotion @torfjelde what do you think of this? It would change how MCMCChains.Chains are displayed after TuringLang/MCMCChains.jl#431. It would also be breaking, so it would be nice to decide on this before merging that PR.

sethaxen · 2024-02-14T09:57:09Z

An alternative would be to use an IntervalSets.ClosedInterval, which would look like this:

julia> summarize(x)
SummaryStats
     mean    std  hdi94            mcse_mean  mcse_std  ess_tail  ess_bulk  rhat 
 1  0.499  0.291  0.0379 .. 0.975     0.0047    0.0020      3993      3878  1.00
 2  0.500  0.287  0.0521 .. 0.991     0.0046    0.0021      3926      3897  1.00
 3  0.491  0.289  0.0197 .. 0.948     0.0046    0.0020      4171      3820  1.00

On the upside, the type specifies whether the bounds are inclusive (they are), and the visualization is equally compact. Also, the next release of IntervalSets contains a plotting recipe for the intervals.

On the downside, the use of ellipses could look like columns were truncated from the middle like pandas does. We could change the representation to use square brackets, but that might make the user thing it's a length 2 vector.

devmotion · 2024-02-14T14:03:22Z

Generally, I strive for avoiding unnecessary dependencies but I think I'd prefer both the type representation as a ClosedInterval and the REPL printout (numbers correctly aligned) in your last comment, so I'd be in favour of using IntervalSets.

sethaxen · 2024-02-14T14:09:17Z

numbers correctly aligned

Alignment is tricky here. For intervals we would probably align on the .., so the numbers themselves wouldn't necessarily be aligned (there's no way to align on all decimals and .. without circumventing PrettyTables' own formatting pipeline for this column)

devmotion · 2024-02-14T14:10:51Z

Ah I see, then it was just a coincidence since all numbers were positive? In any case, I still prefer .. over the tuple representation I think.

sethaxen · 2024-02-14T17:54:01Z

Ah I see, then it was just a coincidence since all numbers were positive?

Yeah, here's kind of the worst-case scenario (parameter scales differ by 3 or more orders of magnitude):

julia> x = ar1(0.5, sqrt(1-0.5^2), 200, 4, 5) .* reshape(exp10.(1:5), 1, 1, :) .+ reshape(1:5, 1, 1, :);

julia> summarize(x)
SummaryStats
       mean        std  hdi94                  mcse_mean  mcse_std  ess_tail  ess_bulk  rhat 
 1   0.1     10.1           -20.6 .. 17.3        0.55      0.30          493       343  1.00
 2  -0.0009  94              -179 .. 171         5.2       2.9           556       319  1.01
 3  -1        9.8e+02   -1.75e+03 .. 1.88e+03   59        28             580       277  1.01
 4  -1.e+02   1.e+04    -1.82e+04 .. 1.84e+04    6.0e+02   5.6e+02       346       265  1.02
 5   4.e+03   1.03e+05  -1.74e+05 .. 2.04e+05    6.5e+03   3.0e+03       560       250  1.03

In any case, I still prefer .. over the tuple representation I think.

Okay! Unless @torfjelde disagrees, I'll go with that.

sethaxen · 2024-09-03T10:27:06Z

@devmotion @torfjelde When a user specifies default_summary_stats(median) as the statistics, the ETI is used instead of HDI. Unlike HDI, the quantiles themselves have sensible names and could be represented separately. Do you think it would make more sense for users to get an interval consistent with HDI or to get separate columns? e.g.

julia> summarize(x, default_summary_stats()...)
SummaryStats
    mean   std  hdi_94%         mcse_mean  mcse_std  ess_tail  ess_bulk  rhat 
 1   1.1  0.98  -0.808 .. 2.83      0.053     0.030       506       341  1.00
 2   2.1  1.9    -1.35 .. 5.79      0.12      0.057       448       266  1.01
 3   3.1  2.8    -1.93 .. 8.50      0.17      0.091       507       276  1.01
 4   3.8  4.1    -4.45 .. 11.1      0.23      0.13        472       315  1.00
 5   5.2  4.9    -3.54 .. 15.1      0.29      0.18        412       279  1.01

For HDI vs for ETI

julia> summarize(x, default_summary_stats(median)...)
SummaryStats
    median    mad  eti_94%         mcse_median  ess_tail  ess_median  rhat 
 1     1.1  0.990  -0.725 .. 2.94        0.064       506         354  1.00
 2     2.1  1.83    -1.43 .. 5.78        0.13        448         316  1.01
 3     3.0  2.79    -1.96 .. 8.46        0.18        507         438  1.01
 4     3.8  4.27    -3.96 .. 11.7        0.37        472         372  1.00
 5     5.1  4.80    -4.55 .. 14.5        0.36        412         351  1.01

OR

julia> summarize(x, default_summary_stats(median)...)
SummaryStats
    median    mad     q3%   q97%  mcse_median  ess_tail  ess_median  rhat 
 1     1.1  0.990  -0.725   2.94        0.064       506         354  1.00
 2     2.1  1.83   -1.43    5.78        0.13        448         316  1.01
 3     3.0  2.79   -1.96    8.46        0.18        507         438  1.01
 4     3.8  4.27   -3.96   11.7         0.37        472         372  1.00
 5     5.1  4.80   -4.55   14.5         0.36        412         351  1.01

devmotion · 2024-09-03T21:46:08Z

I'm leaning towards the interval representation: Partly for consistency, and partly because I think the main interest is the interval, not the 3rd and 97th percentile.

torfjelde · 2024-09-18T13:30:26Z

Sorry, just saw this. But yes, all of this seems very sensible:) Deffo a nice change!

sethaxen mentioned this issue Feb 13, 2024

Import and extend PosteriorStats TuringLang/MCMCChains.jl#431

Open

sethaxen mentioned this issue Sep 5, 2024

Represent HDI and ETI using IntervalSets #36

Merged

sethaxen closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nicer representations of intervals #25

Nicer representations of intervals #25

sethaxen commented Feb 1, 2024

sethaxen commented Feb 11, 2024

sethaxen commented Feb 14, 2024

devmotion commented Feb 14, 2024

sethaxen commented Feb 14, 2024

devmotion commented Feb 14, 2024

sethaxen commented Feb 14, 2024

sethaxen commented Sep 3, 2024

devmotion commented Sep 3, 2024

torfjelde commented Sep 18, 2024

Nicer representations of intervals #25

Nicer representations of intervals #25

Comments

sethaxen commented Feb 1, 2024

sethaxen commented Feb 11, 2024

sethaxen commented Feb 14, 2024

devmotion commented Feb 14, 2024

sethaxen commented Feb 14, 2024

devmotion commented Feb 14, 2024

sethaxen commented Feb 14, 2024

sethaxen commented Sep 3, 2024

devmotion commented Sep 3, 2024

torfjelde commented Sep 18, 2024