-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathfuture.qmd
233 lines (176 loc) · 5.96 KB
/
future.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
---
title: "Future of the tools"
format:
revealjs:
preview-links: auto
logo: "images/logo.png"
footer: "[mrc-ide/odin-monty-workshop-2025](.)"
execute:
echo: true
message: true
output: true
warning: true
---
# What is new?
* Since `odin` v1 (classic odin, pre 2020)
- comparison to data and likelihood support
- run multiple sets of parameters at once
- run in parallel
# What is new?
* Since `odin.dust` (2024 rewrite)
- more efficient parameter updating
- parameter packers
- better parallelism
- periodic variable resetting (`zero_every`)
- better error messages
- compile time array bounds checking
- debugging support
# Syntax changes
* `user()` -> `parameter()`
* Discrete models have a proper time basis, `dt` is now a reserved word
* No longer use R's names for distribution functions
* Named arguments allow clearer code
## Automatic migration
```{r}
sys <- odin2::odin({
update(y) <- y + rnorm(0, sd)
initial(y) <- 0
sd <- user()
})
```
. . .
You can use `odin_migrate()` to rewrite code.
# Limitations
* Much slower compilation time (we will mitigate this by using js)
* Delays less flexible than in version 1 (cannot be used in discrete time models, the default argument has been removed)
# Practical considerations
* Handful of missing features from `odin` v1 and `dust` v1 (`odin.dust`)
- delayed delays
- mixed time models
- compilation to JavaScript
- extendable via C++
# Practical considerations
* The great package migration
- `dust2` to `dust`
- `odin2` to `odin` and all onto CRAN
- Once on CRAN, our ability to change the `dust` and `monty` C++ code is reduced
# Planned new features
## GPU support
* Massively parallel stochastic models
- Proof-of-concept: 1 consumer GPU = 5-10 32-core nodes
* Simulation with many parameter sets harder
## MPI/HPC support
* Alternative approach to parallelism
- based on message passing, rather than shared memory
* Use CPU-based HPC with fast networking
* We are interested in hearing about models that can take advantage of these levels of parallelism
## More radical changes to the DSL?
* Support for events
* More bounds checking and debugging support
* Vector-returning functions (multinomial, matrix mutiplication, etc)
* Describe models in terms of flows
* Composable sub models (I am told this is very hard!)
* Improve monty's little DSL!
* What else?
## Improvement of supported particle methods?
* SMC^2, IF^2
* PF other than bootstrap
* Methods based on estimates of ratio of density rather than ratio of density estimates
# Automatic differentiation
## Gradient vs random walk
- **Goal**: Sample from the posterior efficiently
- 🐢 **Random Walk MCMC**:
- No knowledge of shape of posterior
- Can get stuck in tight or curved regions
- ⚡ **Gradient-based methods**:
- Use the local slope to move efficiently
- Better scaling in high dimensions
## 🍌 The Banana Problem {.smaller}
::::: columns
::: {.column width="50%"}
```{r}
library(monty)
m <- monty_example("banana", sigma = 0.5)
a <- seq(-2, 6, length.out = 1000)
b <- seq(-2.5, 2.5, length.out = 1000)
z <- outer(a, b, function(alpha, beta) {
exp(monty_model_density(m, rbind(alpha, beta)))
})
```
- This posterior has a **strong nonlinear correlation**
- Random walk proposals struggle to explore this space
:::
::: {.column width="50%"}
<div style="margin-top: -1em">
```{r, echo=FALSE, fig.width=6, fig.height=6, out.width="100%"}
par(mar = c(3, 3, 1, 1))
image(a, b, z, col = hcl.colors(30), xlab = "alpha", ylab = "beta")
```
</div>
:::
:::::
## 🐢 Random Walk MCMC: Limitation {.smaller}
::::: columns
::: {.column width="50%"}
```{r sampling_RW}
set.seed(42)
sampler_rw <- monty_sampler_random_walk(vcv = diag(2)*1.5)
samples_rw <- monty_sample(m, sampler_rw, n_steps = 1000, initial = c(0,0))
```
- Acceptance rate `r length(unique(samples_rw$pars[1,,1]))/1000`
- Small steps to avoid rejection → **slow mixing**
- Misses curved geometry
- Inefficient in higher dimensions
:::
::: {.column width="50%"}
<div style="margin-top: -1em">
```{r, echo=FALSE, fig.width=5, fig.height=5, out.width="100%"}
par(mar = c(3, 3, 1, 1))
plot(samples_rw$pars[1,,1], type = "l", main = "Random Walk MCMC")
```
</div>
:::
:::::
## ⚡ Gradient-Based: Faster & Smarter {.smaller}
::::: columns
::: {.column width="50%"}
```{r sampling_HMC}
sampler_hmc <- monty_sampler_hmc(epsilon = 0.2, n_integration_steps = 10)
#samples_hmc <- monty_sample(m, sampler_hmc, n_steps = 1000, initial = c(0,0))
```
- Acceptance rate `r length(unique(samples_rw$pars[1,,1]))/1000`
- Uses **gradient of the log posterior**
- Efficiently explores curved shapes
- Much **better mixing** in fewer steps
- But potentially expensive to compute gradients
:::
::: {.column width="50%"}
<div style="margin-top: -1em">
```{r, echo=FALSE, fig.width=5, fig.height=5, out.width="100%"}
# par(mar = c(3, 3, 1, 1))
# plot(samples_hmc$pars[1,,1], type = "l", main = "HMC")
```

</div>
:::
:::::
## Reverse AutoDiff in odin
- Think of your model as a **computational graph**: data + parameters → output
- Reverse AD walks **backward** through this graph to efficiently compute gradients
- ✅ **More accurate** than numerical methods
- ✅ **Much faster** (especially in high dimensions)
🛠 In `odin`, you write the model normally — gradients come for free
## ✅ Summary
- Gradient-based methods like **HMC/NUTS**:
- Are more efficient, especially for complex or high-dimensional posteriors
- Adapt to local geometry (no tuning random walk scale!)
- Often yield **better convergence diagnostics**
- 🚀 For users fitting models: you'll get **faster, more reliable inference** with gradients when available!
## 🗺️ Autodiff roadmap
* Simple support implemented as a proof-of-concept
- deterministic discrete time models with no arrays
* Expand to support ODE models, models with arrays
* Fully implement algorithms in monty that can exploit gradients
- HMC, NUTS, variational inference
## Parallel tempering
