generated from neuroinformatics-unit/quarto-presentation-template
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.qmd
260 lines (201 loc) · 10.1 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
---
title: Parallelising template building
subtitle: with qbatch
author: Alessandro Felder, Niko Sirmpilatze
execute:
enabled: true
format:
revealjs:
theme: [default, niu-light.scss]
logo: img/logo_niu_light.png
footer: "Parallelising template building | 2025-01-08"
slide-number: c
menu:
numbers: true
chalkboard: true
scrollable: true
preview-links: false
view-distance: 10
mobile-view-distance: 10
auto-animate: true
auto-play-media: true
code-overflow: wrap
highlight-style: atom-one
mermaid:
theme: neutral
fontFamily: arial
curve: linear
html:
theme: [default, niu-dark.scss]
logo: img/logo_niu_dark.png
date: "2025-01-08"
toc: true
code-overflow: scroll
highlight-style: atom-one
mermaid:
theme: neutral
fontFamily: arial
curve: linear
margin-left: 0
embed-resources: true
page-layout: full
my-custom-stuff:
my-reuseable-variable: "I can use this wherever I want in the markdown, and change it in only once place :)"
---
## What is an atlas? {.smaller}
:::: {.columns}
::: {.column width="50%"}
![](img/allen_mouse_template.png){fig-align=center}
:::
::: {.column width="50%"}
![](img/allen_mouse_annotation.png){fig-align=center}
:::
::::
[(Neuro-anatomical) Atlases](https://neuroinformatics.dev/slides-templates-atlases/) consist of a template image and an annotations image (E.g. [The Allen Mouse Brain Common Coordinate Framework](https://doi.org/10.1016/j.cell.2020.04.007))
## Why is it useful?
A standardised annotation, standardised coordinate system to which experimental data can be registered, facilitate data comparability and (re-)combination, and therefore collaboration and data sharing.
[Have accelerated neuroscientific discovery, for species where there is an atlas!](https://docs.google.com/presentation/d/1NpZva4iIX3OW9Uc-Gxk9lc7nCH7kHUgGAA9-xwjXhKc/edit#slide=id.g30d92c812f7_0_3)
## How are they made?
![](img/ANTs-Template-Construction.png){width=900 fig-align=center}
Good template images are an unbiased average of many (~10-1000s) of individuals. This requires many computationally intensive image registrations.
## A new template image {.smaller}
:::: {.columns}
::: {.column width="30%"}
![](img/black_cap_whole.png){fig-align=center}
We made a template image of the Eurasian Blackcap from 18 hemispheres. At 25 um resolution, this ran (sequentially) on the HPC for **~two weeks**.
:::
::: {.column width="70%"}
![](img/Figure3.png){fig-align=center}
:::
::::
## A more difficult challenge {.smaller}
:::: {.columns}
::: {.column width="30%"}
We have 45 molerat hemispheres.
Averaging just 6 of them at low-res sequentially took >2 days.
We estimate that for 45 at high-res it would take **months**.
:::
::: {.column width="70%"}
![](img/malkemper-lab.png){fig-align=center}
:::
::::
## Registration steps are independent
![](img/ANTs-Template-Construction.png){width=900 fig-align=center}
So maybe we can parallelise?
## The template building tech stack
- `brainglobe-template-builder`
- preprocessing GUIs and high-level bash scripts/slurm jobs
- [**`model_build.sh`** from optimisedANTs (CoBrALab)](https://github.com/CoBrALab/optimized_antsMultivariateTemplateConstruction/blob/master/modelbuild.sh)
- which wraps ANTs (Advanced Normalisation Tools)
- which is built on top of ITK (Insight ToolKit)
## A look inside `model_build.sh`
Write files containing bash commands
```{.bash}
echo antsRegistration_affine_SyN.sh --clobber \
... # lots of arguments...
>>${_arg_output_dir}/jobs/${__datetime}/${reg_type}_${i}_reg
```
then execute them with `qbatch`
```{.bash}
qbatch ${_arg_block} \
... # more arguments...
${_arg_output_dir}/jobs/${__datetime}/${reg_type}_${i}_reg
```
> qbatch is a tool for executing commands in parallel across a compute cluster.
[`qbatch`](https://github.com/CoBrALab/qbatch/) is also developed by the CoBrA lab.
## Example registration file
``` {.bash}
... # export lots of variables
antsRegistration_affine_SyN.sh --clobber --no-fast --histogram-matching --skip-nonlinear --linear-type affine --no-mask-extract --moving-mask /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-b07_hemi-L/sub-b07_hemi-L_res-40um_orig-asr_N4_aligned_padded_use4template/sub-b07_hemi-L_res-40um_sym-mask.nii.gz --initial-transform /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/transforms/sub-b07_hemi-L_res-40um_sym-brain_0GenericAffine.mat --convergence 1e-9 -o /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/resample/sub-b07_hemi-L_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-b07_hemi-L/sub-b07_hemi-L_res-40um_orig-asr_N4_aligned_padded_use4template/sub-b07_hemi-L_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/average/template_sharpen_shapeupdate.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/transforms/sub-b07_hemi-L_res-40um_sym-brain_
antsRegistration_affine_SyN.sh --clobber --no-fast --histogram-matching --skip-nonlinear --linear-type affine --no-mask-extract --moving-mask /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-d07_hemi-R/sub-d07_hemi-R_res-40um_orig-asr_N4_aligned_padded_use4template/sub-d07_hemi-R_res-40um_sym-mask.nii.gz --initial-transform /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/transforms/sub-d07_hemi-R_res-40um_sym-brain_0GenericAffine.mat --convergence 1e-9 -o /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/resample/sub-d07_hemi-R_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-d07_hemi-R/sub-d07_hemi-R_res-40um_orig-asr_N4_aligned_padded_use4template/sub-d07_hemi-R_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/average/template_sharpen_shapeupdate.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/transforms/sub-d07_hemi-R_res-40um_sym-brain_
... # 43 more calls to antsRegistration
```
## Example registration file (more simplified)
``` {.bash}
... # export lots of variables
antsRegistration_affine_SyN.sh ...
antsRegistration_affine_SyN.sh ...
... # 43 more calls to antsRegistration
```
## qbatch
- works with slurm 🎉 (and other job managers)
- confusing terminology and limited, but good, documentation 😕
## qbatch
``` {.bash code-line-numbers="0-3|7|10-11"}
$ export QBATCH_PPJ=12 # requested processors per job
$ export QBATCH_CHUNKSIZE=$QBATCH_PPJ # commands to run per job
$ export QBATCH_CORES=$QBATCH_PPJ # commonds to run in parallel per job
$ export QBATCH_NODES=1 # number of compute nodes to request for the job, typically for MPI jobs
$ export QBATCH_MEM="0" # requested memory per job
$ export QBATCH_MEMVARS="mem" # memory request variable to set
$ export QBATCH_SYSTEM="pbs" # queuing system to use ("pbs", "sge","slurm", or "local")
$ export QBATCH_NODES=1 # (PBS-only) nodes to request per job
$ export QBATCH_SGE_PE="smp" # (SGE-only) parallel environment name
$ export QBATCH_QUEUE="1day" # Name of submission queue
$ export QBATCH_OPTIONS="" # Arbitrary cluster options to embed in all jobs
$ export QBATCH_SCRIPT_FOLDER=".qbatch/" # Location to generate jobfiles for submission
$ export QBATCH_SHELL="/bin/sh" # Shell to use to evaluate jobfile
```
## Easy wins
``` {.bash }
$ export QBATCH_SYSTEM="slurm"
$ export QBATCH_QUEUE="cpu"
```
## A naive attempt
"Run 15 registrations at a time, on 15 processors, on the same node"
``` {.bash }
$ export QBATCH_PPJ=15
$ export QBATCH_CHUNKSIZE=15
$ export QBATCH_CORES=15
```
Parallel `antsRegistration` commands compete with each other for processing resources on the same node, making this even slower than a sequential run. This is [built into ITK](https://itk.org/Doxygen/html/ThreadingPage.html).
## Optimisation 🎉
``` {.bash code-line-numbers="1-3"}
$ export QBATCH_PPJ=12 # each antsRegistration call can use 12 processors
$ export QBATCH_CHUNKSIZE=3
$ export QBATCH_CORES=1
```
Split 45 jobs into chunks of 3, run each chunk of 3 in a separate job (so use 15 nodes), and run sequentially within job.
Massive speed-up.
## More optimisation 🎉
Exclude two of our CPU nodes that are a lot slower (and weirdly named starting with "gpu")
``` {.bash }
export QBATCH_OPTIONS="--exclude=gpu-380-24,gpu-380-25"
```
## Customise time-outs for larger jobs
Expand default wall times for short, medium and long jobs
``` {.bash code-line-numbers="8-10"}
bash modelbuild.sh --output-dir "${working_dir}" \
--starting-target first \
--stages rigid,similarity,affine,nlin \
--masks "${working_dir}/mask_paths.txt" \
--average-type "${average_type}" \
--average-prog "${average_prog}" \
--reuse-affines \
--walltime-short "01:30:00" \
--walltime-linear "02:15:00" \
--walltime-nonlinear "13:30:00"\
--no-dry-run \
"${working_dir}/brain_paths.txt"
```
## Struggles ⚠️
- time to understand brief docs
- antsRegistration from ants is built to run across processors on a node (empirically verified)
- can lead to massive slowdowns with threads from different antsRegistration processes competing
- doesn't seem maintained, last commit 4 years ago (but "works"?!)
## Tricks 🪄
* exclude some nodes
* increase slurm memory
* increase walltimes
## Conclusions {.smaller}
:::: {.columns}
::: {.column width="30%"}
* big step forward in template-making at NIU
* 45 mole rats for the price of 3 🤑
* maybe useful elsewhere
* atlas packaging? (maybe even from Python)
:::
::: {.column width="70%"}
![](img/malkemper-lab.png){fig-align=center}
:::
::::