[DRAFT] Protocol redesign #57

vmcru · 2025-04-07T18:51:29Z

This pull request contains the integration of the new dataio files and paradigms. It looks to replace the bash files that run the experiment and the hyper-parameter optimization with new python files. The main changes pertain to the train.py file that loads the data using dataio files, new run_experiments.py file, and later a new run_hparam_optimization.py file that integrates with WANDB.

Drew-Wagner

My initial thoughts after a 30 second skim... I did not check the experiment runner or train files yet.

benchmarks/MOABB/dataio/preprocessing.py

benchmarks/MOABB/hparams/MotorImagery/BNCI2014001/EEGNet.yaml

…g train from CLI maintained.

… other searches . added search space items to EEGNet.yaml file. added search.py in utils, added run_sweep.py,

pplantinga

Here are some initial comments from a cursory review. Things look generally good, but I wonder if the definition of the hyperparameter sweep configuration might be worth some further thought.

As noted in the meeting this could also support weights and biases.

pplantinga · 2025-06-03T16:16:16Z

benchmarks/MOABB/hparams/MotorImagery/BNCI2014001/EEGNet.yaml

@@ -165,3 +215,29 @@ model: !new:models.EEGNet.EEGNet
    dense_max_norm: !ref <dense_max_norm>
    dropout: !ref <dropout>
    dense_n_neurons: !ref <n_classes>
+
+# Search Space


I like the search space concept!

I wonder if there's a way to reference the search space in the rest of the yaml file -- like:

dropout: !ref <search_space.dropout>

This might make it slightly clearer with how it works rather than having to use overrides. You might still have to reload the file each time, depending on how the reference worked exactly.

pplantinga · 2025-06-03T16:25:21Z

benchmarks/MOABB/run_experiments.py

+class ExperimentRunner:
+    """Manages multiple MOABB experiment runs."""
+
+    def __init__(self, args: list):


maybe the __init__ can take the relevant arguments directly and you can add a function like from_commandline to parse the command line and construct an object. e.g.

def __init__(self, hparams, data_folder, output_folder, nsubj, nsess): """do input validation or whatever""" @classmethod def from_commandline_args(cls, args): """Load arguments from command line""" kwargs = self.parse_args(args) return cls(**kwargs) def parse_args(self, args): parser = argparse.ArgumentParser() parser.add_argument(...) ... args = parser.parse_args(args) return args.__dict__

pplantinga · 2025-06-03T16:27:19Z

benchmarks/MOABB/train.py


-    """
+class MOABBBrain(sb.Brain):
+    """Modified Brain class for MOABB experiments with new data format."""

    def init_model(self, model):


Maybe model initialization should be done in the model files themselves?

pplantinga · 2025-06-03T16:29:02Z

benchmarks/MOABB/train.py

@@ -253,174 +387,109 @@ def check_if_best(


 def run_experiment(hparams, run_opts, datasets):
-    """This function performs a single training (e.g., single cross-validation fold)"""
-    idx_examples = np.arange(datasets["train"].dataset.tensors[0].shape[0])
+    """Run a single experiment with the new data format."""


"New" data format may not mean anything to someone reading this file. Perhaps you can either explicitly write out the data format, or just state more clearly what the function does.

pplantinga · 2025-06-03T16:31:52Z

benchmarks/MOABB/utils/search.py

+    return base_config, search_space
+
+
+def load_search_space_only(yaml_path):


I wonder if there's a better way to do this somehow. If the search space can't be integrated with the other parameters as suggested above, maybe this is just better as a separate file?

pplantinga

I have run the default recipe and there's a few small discrepencies with the original default recipe that seem to result in a bit worse accuracy. The ones I identified were the learning rate and the number of parameters. Let's make sure this achieves the same accuracy and runtime before merge.

pplantinga · 2025-06-20T15:16:46Z

benchmarks/MOABB/train.py

+    # Reload hparams with shape information
+    with open(hparams_file) as fin:
+        hparams = load_hyperpyyaml(fin, overrides)
+


I have found that the conversion of the dataset to a DynamicItemDataset results in a substantial slowdown due to the bandpass filter being applied to each sample on the CPU. One simple remedy for small datasets could be to do the following:

Suggested change

# Convert the dataset to a cached dataset

dataset = InMemoryDataset(dataset)

Of course this requires an additional import at the beginning:

from dataio.datasets import InMemoryDataset

pplantinga · 2025-06-20T15:18:54Z

benchmarks/MOABB/train.py

+    # except DryRunComplete:
+    #    print("Dry run successful")
+    #    sys.exit(0)
+    # except Exception as e:
+    #    print(f"Error during execution: {str(e)}")
+    #    if overrides.get("dry_run", False):
+    #        print("Dry run failed")
+    #        sys.exit(1)
+    #    raise


Commented out blocks should be removed before merging.

pplantinga · 2025-06-20T15:20:51Z

benchmarks/MOABB/train.py

-    def on_fit_start(self,):
-        """Gets called at the beginning of ``fit()``"""
-        self.init_model(self.hparams.model)
-        self.init_optimizers()
-        in_shape = (
-            (1,)
-            + tuple(np.floor(self.hparams.input_shape[1:-1]).astype(int))
-            + (1,)
-        )
-        model_summary = summary(
-            self.hparams.model, input_size=in_shape, device=self.device
-        )
-        with open(
-            os.path.join(self.hparams.exp_dir, "model.txt"), "w"
-        ) as text_file:
-            text_file.write(str(model_summary))
-


I don't think this ought to be removed, it still seems useful. The in_shape will have to be changed though, the order of the two inner dimensions is reversed.

pplantinga · 2025-06-20T15:21:12Z

benchmarks/MOABB/train.py

+    # if False:  # hparams["dry_run"]:
+    #   try:
+    #        # Test forward pass with a batch
+    #        batch = next(iter(datasets["train"]))
+    #        with torch.no_grad():
+    #            brain.compute_forward(batch, sb.Stage.TRAIN)
+    #        logger.info("✓ Dry run successful - model forward pass works")
+    #        raise DryRunComplete("Model validation successful")
+    #    except DryRunComplete:
+    #        raise
+    #    except Exception as e:
+    #        logger.error(f"✗ Dry run failed: {str(e)}")
+    #        raise
+
+    # Training


Commented out code should be removed before merge

pplantinga · 2025-06-20T15:22:01Z

benchmarks/MOABB/train.py

+        # Rest of the method remains the same as it handles metrics and checkpointing
+        # which don't need to change for the new data format


"remains the same" as what? Whoever is reading this likely won't care what it looked like before.

pplantinga · 2025-06-20T15:22:29Z

benchmarks/MOABB/train.py

+Authors
+-------
+Victor Cruz, 2025
+(Based on original work by Davide Borra and Mirco Ravanelli)


Our typical practice in SpeechBrain so far has been to leave the original authors and just add new authors as they edit the file.

vmcru added 5 commits April 7, 2025 12:48

[DRAFT] inital setup for the redesign of the training protocol.

f2c5c89

fix and command sharingf for main experiment run.

389e7c8

cleanup

8e64513

cleanup of data files

130f567

removed exceptions floating file

a4a9431

Drew-Wagner reviewed Apr 7, 2025

View reviewed changes

benchmarks/MOABB/dataio/preprocessing.py Outdated Show resolved Hide resolved

benchmarks/MOABB/hparams/MotorImagery/BNCI2014001/EEGNet.yaml Outdated Show resolved Hide resolved

vmcru added 11 commits May 7, 2025 00:12

dataset and dynamic items cannow be defined in the yaml file.

6dff639

precommit review

9c311d3

train can be called from run_experiments without a subprocess, callin…

c9abf03

…g train from CLI maintained.

Initial setup for the hopt, implementation for optuna. groundowkr for…

cdab7fd

… other searches . added search space items to EEGNet.yaml file. added search.py in utils, added run_sweep.py,

precommit changes

3663db7

unused packages removed from run_sweep.py

3576670

imports shuffle

12cd601

None comparison error.

ce419bd

indentation fix on EEGNet.yaml

a07c2e5

nl for EEGNet

e56248e

example command to run hopt.

ec44a0c

pplantinga reviewed Jun 3, 2025

View reviewed changes

pplantinga reviewed Jun 20, 2025

View reviewed changes

		return base_config, search_space


		def load_search_space_only(yaml_path):


	# Convert the dataset to a cached dataset
	dataset = InMemoryDataset(dataset)

		# Rest of the method remains the same as it handles metrics and checkpointing
		# which don't need to change for the new data format

[DRAFT] Protocol redesign #57

Are you sure you want to change the base?

[DRAFT] Protocol redesign #57

Uh oh!

Conversation

vmcru commented Apr 7, 2025

Uh oh!

Drew-Wagner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!