Skip to content

Conversation

vmcru
Copy link
Contributor

@vmcru vmcru commented Apr 7, 2025

This pull request contains the integration of the new dataio files and paradigms. It looks to replace the bash files that run the experiment and the hyper-parameter optimization with new python files. The main changes pertain to the train.py file that loads the data using dataio files, new run_experiments.py file, and later a new run_hparam_optimization.py file that integrates with WANDB.

Copy link
Collaborator

@Drew-Wagner Drew-Wagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thoughts after a 30 second skim... I did not check the experiment runner or train files yet.

Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some initial comments from a cursory review. Things look generally good, but I wonder if the definition of the hyperparameter sweep configuration might be worth some further thought.

As noted in the meeting this could also support weights and biases.

@@ -165,3 +215,29 @@ model: !new:models.EEGNet.EEGNet
dense_max_norm: !ref <dense_max_norm>
dropout: !ref <dropout>
dense_n_neurons: !ref <n_classes>

# Search Space
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the search space concept!

I wonder if there's a way to reference the search space in the rest of the yaml file -- like:

dropout: !ref <search_space.dropout>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might make it slightly clearer with how it works rather than having to use overrides. You might still have to reload the file each time, depending on how the reference worked exactly.

class ExperimentRunner:
"""Manages multiple MOABB experiment runs."""

def __init__(self, args: list):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the __init__ can take the relevant arguments directly and you can add a function like from_commandline to parse the command line and construct an object. e.g.

def __init__(self, hparams, data_folder, output_folder, nsubj, nsess):
  """do input validation or whatever"""

@classmethod
def from_commandline_args(cls, args):
  """Load arguments from command line"""
  kwargs = self.parse_args(args)
  return cls(**kwargs)

def parse_args(self, args):
  parser = argparse.ArgumentParser()
  parser.add_argument(...)
  ...
  args = parser.parse_args(args)
  return args.__dict__


"""
class MOABBBrain(sb.Brain):
"""Modified Brain class for MOABB experiments with new data format."""

def init_model(self, model):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe model initialization should be done in the model files themselves?

@@ -253,174 +387,109 @@ def check_if_best(


def run_experiment(hparams, run_opts, datasets):
"""This function performs a single training (e.g., single cross-validation fold)"""
idx_examples = np.arange(datasets["train"].dataset.tensors[0].shape[0])
"""Run a single experiment with the new data format."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"New" data format may not mean anything to someone reading this file. Perhaps you can either explicitly write out the data format, or just state more clearly what the function does.

return base_config, search_space


def load_search_space_only(yaml_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a better way to do this somehow. If the search space can't be integrated with the other parameters as suggested above, maybe this is just better as a separate file?

Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run the default recipe and there's a few small discrepencies with the original default recipe that seem to result in a bit worse accuracy. The ones I identified were the learning rate and the number of parameters. Let's make sure this achieves the same accuracy and runtime before merge.

# Reload hparams with shape information
with open(hparams_file) as fin:
hparams = load_hyperpyyaml(fin, overrides)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found that the conversion of the dataset to a DynamicItemDataset results in a substantial slowdown due to the bandpass filter being applied to each sample on the CPU. One simple remedy for small datasets could be to do the following:

Suggested change
# Convert the dataset to a cached dataset
dataset = InMemoryDataset(dataset)

Of course this requires an additional import at the beginning:

from dataio.datasets import InMemoryDataset

Comment on lines +487 to +495
# except DryRunComplete:
# print("Dry run successful")
# sys.exit(0)
# except Exception as e:
# print(f"Error during execution: {str(e)}")
# if overrides.get("dry_run", False):
# print("Dry run failed")
# sys.exit(1)
# raise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out blocks should be removed before merging.

Comment on lines -92 to -108
def on_fit_start(self,):
"""Gets called at the beginning of ``fit()``"""
self.init_model(self.hparams.model)
self.init_optimizers()
in_shape = (
(1,)
+ tuple(np.floor(self.hparams.input_shape[1:-1]).astype(int))
+ (1,)
)
model_summary = summary(
self.hparams.model, input_size=in_shape, device=self.device
)
with open(
os.path.join(self.hparams.exp_dir, "model.txt"), "w"
) as text_file:
text_file.write(str(model_summary))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this ought to be removed, it still seems useful. The in_shape will have to be changed though, the order of the two inner dimensions is reversed.

Comment on lines +445 to +459
# if False: # hparams["dry_run"]:
# try:
# # Test forward pass with a batch
# batch = next(iter(datasets["train"]))
# with torch.no_grad():
# brain.compute_forward(batch, sb.Stage.TRAIN)
# logger.info("✓ Dry run successful - model forward pass works")
# raise DryRunComplete("Model validation successful")
# except DryRunComplete:
# raise
# except Exception as e:
# logger.error(f"✗ Dry run failed: {str(e)}")
# raise

# Training
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out code should be removed before merge

Comment on lines +247 to +248
# Rest of the method remains the same as it handles metrics and checkpointing
# which don't need to change for the new data format
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"remains the same" as what? Whoever is reading this likely won't care what it looked like before.

Authors
-------
Victor Cruz, 2025
(Based on original work by Davide Borra and Mirco Ravanelli)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our typical practice in SpeechBrain so far has been to leave the original authors and just add new authors as they edit the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants