-
Notifications
You must be signed in to change notification settings - Fork 44
[DRAFT] Protocol redesign #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop-eeg
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thoughts after a 30 second skim... I did not check the experiment runner or train files yet.
…g train from CLI maintained.
… other searches . added search space items to EEGNet.yaml file. added search.py in utils, added run_sweep.py,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some initial comments from a cursory review. Things look generally good, but I wonder if the definition of the hyperparameter sweep configuration might be worth some further thought.
As noted in the meeting this could also support weights and biases.
@@ -165,3 +215,29 @@ model: !new:models.EEGNet.EEGNet | |||
dense_max_norm: !ref <dense_max_norm> | |||
dropout: !ref <dropout> | |||
dense_n_neurons: !ref <n_classes> | |||
|
|||
# Search Space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the search space concept!
I wonder if there's a way to reference the search space in the rest of the yaml file -- like:
dropout: !ref <search_space.dropout>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might make it slightly clearer with how it works rather than having to use overrides. You might still have to reload the file each time, depending on how the reference worked exactly.
class ExperimentRunner: | ||
"""Manages multiple MOABB experiment runs.""" | ||
|
||
def __init__(self, args: list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the __init__
can take the relevant arguments directly and you can add a function like from_commandline
to parse the command line and construct an object. e.g.
def __init__(self, hparams, data_folder, output_folder, nsubj, nsess):
"""do input validation or whatever"""
@classmethod
def from_commandline_args(cls, args):
"""Load arguments from command line"""
kwargs = self.parse_args(args)
return cls(**kwargs)
def parse_args(self, args):
parser = argparse.ArgumentParser()
parser.add_argument(...)
...
args = parser.parse_args(args)
return args.__dict__
|
||
""" | ||
class MOABBBrain(sb.Brain): | ||
"""Modified Brain class for MOABB experiments with new data format.""" | ||
|
||
def init_model(self, model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe model initialization should be done in the model files themselves?
@@ -253,174 +387,109 @@ def check_if_best( | |||
|
|||
|
|||
def run_experiment(hparams, run_opts, datasets): | |||
"""This function performs a single training (e.g., single cross-validation fold)""" | |||
idx_examples = np.arange(datasets["train"].dataset.tensors[0].shape[0]) | |||
"""Run a single experiment with the new data format.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"New" data format may not mean anything to someone reading this file. Perhaps you can either explicitly write out the data format, or just state more clearly what the function does.
return base_config, search_space | ||
|
||
|
||
def load_search_space_only(yaml_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a better way to do this somehow. If the search space can't be integrated with the other parameters as suggested above, maybe this is just better as a separate file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run the default recipe and there's a few small discrepencies with the original default recipe that seem to result in a bit worse accuracy. The ones I identified were the learning rate and the number of parameters. Let's make sure this achieves the same accuracy and runtime before merge.
# Reload hparams with shape information | ||
with open(hparams_file) as fin: | ||
hparams = load_hyperpyyaml(fin, overrides) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found that the conversion of the dataset to a DynamicItemDataset results in a substantial slowdown due to the bandpass filter being applied to each sample on the CPU. One simple remedy for small datasets could be to do the following:
# Convert the dataset to a cached dataset | |
dataset = InMemoryDataset(dataset) |
Of course this requires an additional import at the beginning:
from dataio.datasets import InMemoryDataset
# except DryRunComplete: | ||
# print("Dry run successful") | ||
# sys.exit(0) | ||
# except Exception as e: | ||
# print(f"Error during execution: {str(e)}") | ||
# if overrides.get("dry_run", False): | ||
# print("Dry run failed") | ||
# sys.exit(1) | ||
# raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out blocks should be removed before merging.
def on_fit_start(self,): | ||
"""Gets called at the beginning of ``fit()``""" | ||
self.init_model(self.hparams.model) | ||
self.init_optimizers() | ||
in_shape = ( | ||
(1,) | ||
+ tuple(np.floor(self.hparams.input_shape[1:-1]).astype(int)) | ||
+ (1,) | ||
) | ||
model_summary = summary( | ||
self.hparams.model, input_size=in_shape, device=self.device | ||
) | ||
with open( | ||
os.path.join(self.hparams.exp_dir, "model.txt"), "w" | ||
) as text_file: | ||
text_file.write(str(model_summary)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this ought to be removed, it still seems useful. The in_shape
will have to be changed though, the order of the two inner dimensions is reversed.
# if False: # hparams["dry_run"]: | ||
# try: | ||
# # Test forward pass with a batch | ||
# batch = next(iter(datasets["train"])) | ||
# with torch.no_grad(): | ||
# brain.compute_forward(batch, sb.Stage.TRAIN) | ||
# logger.info("✓ Dry run successful - model forward pass works") | ||
# raise DryRunComplete("Model validation successful") | ||
# except DryRunComplete: | ||
# raise | ||
# except Exception as e: | ||
# logger.error(f"✗ Dry run failed: {str(e)}") | ||
# raise | ||
|
||
# Training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out code should be removed before merge
# Rest of the method remains the same as it handles metrics and checkpointing | ||
# which don't need to change for the new data format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"remains the same" as what? Whoever is reading this likely won't care what it looked like before.
Authors | ||
------- | ||
Victor Cruz, 2025 | ||
(Based on original work by Davide Borra and Mirco Ravanelli) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our typical practice in SpeechBrain so far has been to leave the original authors and just add new authors as they edit the file.
This pull request contains the integration of the new dataio files and paradigms. It looks to replace the bash files that run the experiment and the hyper-parameter optimization with new python files. The main changes pertain to the train.py file that loads the data using dataio files, new run_experiments.py file, and later a new run_hparam_optimization.py file that integrates with WANDB.