Proposal: Cholesky-tracing to speed up posterior predictive sampling #5451

michaelosthege · 2022-02-08T20:57:32Z

michaelosthege
Feb 8, 2022
Maintainer

Let 👇 be part a big big model that's really expensive to run MCMC on.

def big_model():
    D = 2 # this could be even higher-dimensional !
    scaling = ...
    ls = pm.LogNormal(..., size=D)
    cov = scaling**2 * ExpQuad(input_dim=2, ls=ls)
    gp = pm.gp.Latent(cov)
    gp.prior(
        "my_var",
        X,
        size=50,
    )
    # ...
    return gp

Then a typical workflow would be 👇 first running just the expensive MCMC, for example on a cluster.

with pm.Model():
    big_model()
    idata = pm.sample(
        draws=100_000,
        idata_kwargs=(loglikelihood=False),
        compute_convergence_checks=False
    )
idata.to_netcdf("this_was_expensive.nc")

And doing the diagnostics & visualizations on another machine later.

idata = arviz.from_netcdf("this_was_expensive.nc")
# TODO: check convergence

# Now make a nice visualization of the GP on a Xnew grid
with pm.Model():
    gp = big_model()
    gp.conditional(..., Xnew)
    pp = pm.sample_posterior_predictive(idata)

The Problem

The most expensive (?) step in the model is the cholesky(stabilize(cov(X))).
Which needs to be computed for every posterior draw as soon as the covariance function has priors for scaling/lengthscale.

For the conditional(Xnew), the cholesky(stabilize(cov(X))) is created again.
While the InferenceData contains draws for the scaling/lengthscale, this most expensive step of the model is re-computed.

Proposed solution

We could introduce an optional trace_cholesky: str kwarg that wraps the cholesky(stabilize(cov(X))) in _build_prior in a pm.Deterministic(trace_cholesky, ...), making it end up in InferenceData.posterior.

Similarly, we could add a from_cholesky to the gp.conditional to automatically replace the cholesky(stabilize(cov(X))) re-creation in _build_conditional with a modelcontext(None)[from_cholesky] link to the corresponding pm.Deterministic model variable from before.

gp.prior(..., trace_cholesky="myGP_cholesky")
# ...
gp.conditional(..., from_cholesky="myGP_cholesky")

In pm.sample_posterior_predictive we'd need to replace not only RVs, but also Deterministic variables by input nodes fed from the posterior samples---if we're not doing that already(?).

Note that unless we have some cholesky caching mechanism in Aesara (which I think doesn't exist), this would also speed up pm.sample_posterior_predictive running in the same notebook session.

@bwengals @fonnesbeck @aseyboldt what do you think?

michaelosthege · 2022-02-09T11:37:24Z

michaelosthege
Feb 9, 2022
Maintainer Author

For my specific problem that initiated this line of thought, it turned out that my conda environment is broken. Probably some BLAS/NumPy dependencies broke and I'll have to nuke it.

So maybe others can comment if they experienced performance problems? The proposed solution from above is rather simply, but we don't need to fix a problem that doesn't exit.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Proposal: Cholesky-tracing to speed up posterior predictive sampling #5451

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Proposal: Cholesky-tracing to speed up posterior predictive sampling #5451

Uh oh!

michaelosthege Feb 8, 2022 Maintainer

The Problem

Proposed solution

Replies: 1 comment

Uh oh!

michaelosthege Feb 9, 2022 Maintainer Author

michaelosthege
Feb 8, 2022
Maintainer

michaelosthege
Feb 9, 2022
Maintainer Author