You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This guide explains how to run a `lobster` training job using SLURM on a GPU-enabled system. It also describes which environment variables need to be exported for the job to run properly.
4
+
5
+
# SLURM Job Script
6
+
The provided example job script `scripts/train_ume.sh` is configured up for training the `Ume` model on a GPU-enabled SLURM cluster.
7
+
8
+
You will need to set specific environment variables to run the job. These will be read by the `Ume` hydra configuration file, which is located at `src/lobster/hydra_config/experiment/train_ume.yaml`.
9
+
10
+
Variables:
11
+
12
+
*`LOBSTER_DATA_DIR`: Path to the directory containing your training data. Datasets will be downloaded and cached to this directory (if `data.download` is set to `True` in the hydra configuration file).
13
+
*`LOBSTER_RUNS_DIR`: Path to the directory where training results (model checkpoints, logs, etc.) will be stored.
14
+
*`LOBSTER_USER`: The user entity for the logger (usually your wandb username).
15
+
*`WANDB_BASE_URL`: The base URL for the Weights & Biases service. Optional - only needed if you wandb account is not on the default wandb server.
0 commit comments