index.qmd

---
title: Parallelising template building
subtitle: with qbatch
author: Alessandro Felder, Niko Sirmpilatze
execute: 
  enabled: true
format:
    revealjs:
        theme: [default, niu-light.scss]
        logo: img/logo_niu_light.png
        footer: "Parallelising template building | 2025-01-08"
        slide-number: c
        menu:
            numbers: true
        chalkboard: true
        scrollable: true
        preview-links: false
        view-distance: 10
        mobile-view-distance: 10
        auto-animate: true
        auto-play-media: true
        code-overflow: wrap
        highlight-style: atom-one
        mermaid: 
          theme: neutral
          fontFamily: arial
          curve: linear
    html:
        theme: [default, niu-dark.scss]
        logo: img/logo_niu_dark.png
        date: "2025-01-08"
        toc: true
        code-overflow: scroll
        highlight-style: atom-one
        mermaid: 
          theme: neutral
          fontFamily: arial
          curve: linear
          margin-left: 0
        embed-resources: true
        page-layout: full
my-custom-stuff:
   my-reuseable-variable: "I can use this wherever I want in the markdown, and change it in only once place :)"
---

## What is an atlas? {.smaller}

:::: {.columns}

::: {.column width="50%"}
![](img/allen_mouse_template.png){fig-align=center}
:::

::: {.column width="50%"}
![](img/allen_mouse_annotation.png){fig-align=center}
:::

::::

[(Neuro-anatomical) Atlases](https://neuroinformatics.dev/slides-templates-atlases/) consist of a template image and an annotations image (E.g. [The Allen Mouse Brain Common Coordinate Framework](https://doi.org/10.1016/j.cell.2020.04.007))

## Why is it useful?

A standardised annotation, standardised coordinate system to which experimental data can be registered, facilitate data comparability and (re-)combination, and therefore collaboration and data sharing.

[Have accelerated neuroscientific discovery, for species where there is an atlas!](https://docs.google.com/presentation/d/1NpZva4iIX3OW9Uc-Gxk9lc7nCH7kHUgGAA9-xwjXhKc/edit#slide=id.g30d92c812f7_0_3)

## How are they made?

![](img/ANTs-Template-Construction.png){width=900 fig-align=center}

Good template images are an unbiased average of many (~10-1000s) of individuals. This requires many computationally intensive image registrations.

## A new template image {.smaller}

:::: {.columns}

::: {.column width="30%"}
![](img/black_cap_whole.png){fig-align=center}

We made a template image of the Eurasian Blackcap from 18 hemispheres. At 25 um resolution, this ran (sequentially) on the HPC for **~two weeks**.
:::

::: {.column width="70%"}
![](img/Figure3.png){fig-align=center}
:::

::::

## A more difficult challenge {.smaller}

:::: {.columns}

::: {.column width="30%"}
We have 45 molerat hemispheres. 

Averaging just 6 of them at low-res sequentially took >2 days.

We estimate that for 45 at high-res it would take **months**.
:::

::: {.column width="70%"}
![](img/malkemper-lab.png){fig-align=center}
:::

::::

## Registration steps are independent

![](img/ANTs-Template-Construction.png){width=900 fig-align=center}

So maybe we can parallelise?

## The template building tech stack

- `brainglobe-template-builder`
  - preprocessing GUIs and high-level bash scripts/slurm jobs
- [**`model_build.sh`** from optimisedANTs (CoBrALab)](https://github.com/CoBrALab/optimized_antsMultivariateTemplateConstruction/blob/master/modelbuild.sh)
- which wraps ANTs (Advanced Normalisation Tools)
- which is built on top of ITK (Insight ToolKit)

## A look inside `model_build.sh`

Write files containing bash commands
```{.bash}
echo antsRegistration_affine_SyN.sh --clobber \
  ... # lots of arguments...  
  >>${_arg_output_dir}/jobs/${__datetime}/${reg_type}_${i}_reg
```
then execute them with `qbatch`
```{.bash}
qbatch ${_arg_block} \
  ... # more arguments...
  ${_arg_output_dir}/jobs/${__datetime}/${reg_type}_${i}_reg
```

> qbatch is a tool for executing commands in parallel across a compute cluster.

[`qbatch`](https://github.com/CoBrALab/qbatch/) is also developed by the CoBrA lab.

## Example registration file
``` {.bash}
... # export lots of variables
antsRegistration_affine_SyN.sh --clobber --no-fast --histogram-matching --skip-nonlinear --linear-type affine --no-mask-extract --moving-mask /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-b07_hemi-L/sub-b07_hemi-L_res-40um_orig-asr_N4_aligned_padded_use4template/sub-b07_hemi-L_res-40um_sym-mask.nii.gz --initial-transform /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/transforms/sub-b07_hemi-L_res-40um_sym-brain_0GenericAffine.mat --convergence 1e-9 -o /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/resample/sub-b07_hemi-L_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-b07_hemi-L/sub-b07_hemi-L_res-40um_orig-asr_N4_aligned_padded_use4template/sub-b07_hemi-L_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/average/template_sharpen_shapeupdate.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/transforms/sub-b07_hemi-L_res-40um_sym-brain_
antsRegistration_affine_SyN.sh --clobber --no-fast --histogram-matching --skip-nonlinear --linear-type affine --no-mask-extract --moving-mask /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-d07_hemi-R/sub-d07_hemi-R_res-40um_orig-asr_N4_aligned_padded_use4template/sub-d07_hemi-R_res-40um_sym-mask.nii.gz --initial-transform /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/transforms/sub-d07_hemi-R_res-40um_sym-brain_0GenericAffine.mat --convergence 1e-9 -o /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/resample/sub-d07_hemi-R_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-d07_hemi-R/sub-d07_hemi-R_res-40um_orig-asr_N4_aligned_padded_use4template/sub-d07_hemi-R_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/average/template_sharpen_shapeupdate.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/transforms/sub-d07_hemi-R_res-40um_sym-brain_
... # 43 more calls to antsRegistration
```

## Example registration file (more simplified)
``` {.bash}
... # export lots of variables
antsRegistration_affine_SyN.sh ...
antsRegistration_affine_SyN.sh ...
... # 43 more calls to antsRegistration
```

## qbatch

- works with slurm 🎉 (and other job managers)
- confusing terminology and limited, but good, documentation 😕

## qbatch 

``` {.bash code-line-numbers="0-3|7|10-11"}
$ export QBATCH_PPJ=12                   # requested processors per job
$ export QBATCH_CHUNKSIZE=$QBATCH_PPJ    # commands to run per job
$ export QBATCH_CORES=$QBATCH_PPJ        # commonds to run in parallel per job
$ export QBATCH_NODES=1                  # number of compute nodes to request for the job, typically for MPI jobs
$ export QBATCH_MEM="0"                  # requested memory per job
$ export QBATCH_MEMVARS="mem"            # memory request variable to set
$ export QBATCH_SYSTEM="pbs"             # queuing system to use ("pbs", "sge","slurm", or "local")
$ export QBATCH_NODES=1                  # (PBS-only) nodes to request per job
$ export QBATCH_SGE_PE="smp"             # (SGE-only) parallel environment name
$ export QBATCH_QUEUE="1day"             # Name of submission queue
$ export QBATCH_OPTIONS=""               # Arbitrary cluster options to embed in all jobs
$ export QBATCH_SCRIPT_FOLDER=".qbatch/" # Location to generate jobfiles for submission
$ export QBATCH_SHELL="/bin/sh"          # Shell to use to evaluate jobfile
```

## Easy wins

``` {.bash }
$ export QBATCH_SYSTEM="slurm"
$ export QBATCH_QUEUE="cpu"
```

## A naive attempt

"Run 15 registrations at a time, on 15 processors, on the same node"
``` {.bash }
$ export QBATCH_PPJ=15
$ export QBATCH_CHUNKSIZE=15
$ export QBATCH_CORES=15
```
Parallel `antsRegistration` commands compete with each other for processing resources on the same node, making this even slower than a sequential run. This is [built into ITK](https://itk.org/Doxygen/html/ThreadingPage.html).

## Optimisation 🎉

``` {.bash code-line-numbers="1-3"}
$ export QBATCH_PPJ=12 # each antsRegistration call can use 12 processors
$ export QBATCH_CHUNKSIZE=3
$ export QBATCH_CORES=1
```
Split 45 jobs into chunks of 3, run each chunk of 3 in a separate job (so use 15 nodes), and run sequentially within job. 

Massive speed-up.

## More optimisation 🎉

Exclude two of our CPU nodes that are a lot slower (and weirdly named starting with "gpu")

``` {.bash }
export QBATCH_OPTIONS="--exclude=gpu-380-24,gpu-380-25"
```

## Customise time-outs for larger jobs
Expand default wall times for short, medium and long jobs

``` {.bash code-line-numbers="8-10"}
bash modelbuild.sh --output-dir "${working_dir}" \
  --starting-target first \
  --stages rigid,similarity,affine,nlin \
  --masks "${working_dir}/mask_paths.txt" \
  --average-type "${average_type}" \
  --average-prog "${average_prog}" \
  --reuse-affines \
  --walltime-short "01:30:00" \
  --walltime-linear "02:15:00" \
  --walltime-nonlinear "13:30:00"\
  --no-dry-run \
  "${working_dir}/brain_paths.txt"
```

## Struggles ⚠️
- time to understand brief docs
- antsRegistration from ants is built to run across processors on a node (empirically verified)
    - can lead to massive slowdowns with threads from different antsRegistration processes competing
- doesn't seem maintained, last commit 4 years ago (but "works"?!)

## Tricks 🪄
* exclude some nodes
* increase slurm memory
* increase walltimes

## Conclusions {.smaller}

:::: {.columns}

::: {.column width="30%"}
* big step forward in template-making at NIU
  * 45 mole rats for the price of 3 🤑
* maybe useful elsewhere
  * atlas packaging? (maybe even from Python)
:::

::: {.column width="70%"}
![](img/malkemper-lab.png){fig-align=center}
:::

::::