Move the warning of using the sequential chain method to the constructor #893

fehiepsi · 2021-01-26T05:32:49Z

Tried to address memory leak as reported by @PaoloRanzi81 in #539 for chain_method='sequential' in 1 GPU but haven't found the root problem yet. But I think it would be nice to raise the warning about not enough devices to run parallel method as early as possible.

Also use jax.local_device_count instead of jax.lib.xla_bridge.device_count to calculate the number of available devices, as mentioned in docs of pmap.

fehiepsi · 2021-01-26T20:35:31Z

numpyro/infer/mcmc.py

-                    states, last_state = lax.map(partial_map_fn, map_args)
-            elif chain_method == 'parallel':
+            if self.chain_method == 'sequential':
+                states, last_state = _laxmap(partial_map_fn, map_args)


Using lax.map here is a bit faster but expensive in term of memory requirement so I switched to for loop (loop over chains - for each chain, we still use lax.scan if progress_bar=False and jit(sample_fn) if progress_bar=True to draw samples). See the benchmark on GPU (on CPU, the performance is similar)

%%time %%memit import jax import numpyro; numpyro.set_platform("gpu") import numpyro.distributions as dist from numpyro.infer import MCMC, NUTS def model(): numpyro.sample("x", dist.Normal(0, 1).expand([100])) mcmc = MCMC(NUTS(model), 1000, num_samples=10000, progress_bar=False, num_chains=8) mcmc.run(jax.random.PRNGKey(0)) samples = mcmc.get_samples(group_by_chain=True)["x"] print(samples.shape)

using lax.map

(8, 10000, 100) peak memory: 5462.81 MiB, increment: 5415.39 MiB CPU times: user 9min 23s, sys: 6.47 s, total: 9min 29s Wall time: 9min 25s

while using for loop _laxmap:

(8, 10000, 100) peak memory: 1795.53 MiB, increment: 1748.04 MiB CPU times: user 9min 41s, sys: 5.9 s, total: 9min 47s Wall time: 9min 44s

cc @PaoloRanzi81 I'm not sure if this will be enough to solve OOM in your model but I guess it could help a bit.

This seems more than acceptable for the memory savings!

neerajprad · 2021-01-26T22:14:47Z

numpyro/infer/mcmc.py

+            chain_method = 'sequential'
+            warnings.warn('There are not enough devices to run parallel chains: expected {} but got {}.'
+                          ' Chains will be drawn sequentially. If you are running MCMC in CPU,'
+                          ' consider to use `numpyro.set_host_device_count({})` at the beginning'


nit: consider using...devices are available...

fehiepsi added 4 commits January 25, 2021 23:04

move the warning to the constructor

dd1f975

use local_device_count

15283ac

mention about local_device_count in the warning

d528b55

fix docs for num_chains

060f0ca

fehiepsi requested a review from neerajprad January 26, 2021 05:32

fehiepsi added awaiting review easy labels Jan 26, 2021

fehiepsi added 2 commits January 26, 2021 12:31

clean up the TODO

e1b16de

use _laxmap to reduce memory requirement

5cd36a2

fehiepsi mentioned this pull request Jan 26, 2021

GPU Memory #539

Closed

fehiepsi removed the easy label Jan 26, 2021

fehiepsi commented Jan 26, 2021

View reviewed changes

neerajprad reviewed Jan 26, 2021

View reviewed changes

address comment

3229168

neerajprad approved these changes Jan 26, 2021

View reviewed changes

neerajprad merged commit 16edc9f into pyro-ppl:master Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move the warning of using the sequential chain method to the constructor #893

Move the warning of using the sequential chain method to the constructor #893

fehiepsi commented Jan 26, 2021

fehiepsi Jan 26, 2021 •

edited

Loading

neerajprad Jan 26, 2021

neerajprad Jan 26, 2021

Move the warning of using the sequential chain method to the constructor #893

Move the warning of using the sequential chain method to the constructor #893

Conversation

fehiepsi commented Jan 26, 2021

fehiepsi Jan 26, 2021 • edited Loading

Choose a reason for hiding this comment

using lax.map

while using for loop _laxmap:

neerajprad Jan 26, 2021

Choose a reason for hiding this comment

neerajprad Jan 26, 2021

Choose a reason for hiding this comment

fehiepsi Jan 26, 2021 •

edited

Loading

while using for loop `_laxmap`: