Skip to content

Basic reproducibility features #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jewellsean opened this issue Jun 11, 2014 · 1 comment
Open

Basic reproducibility features #28

jewellsean opened this issue Jun 11, 2014 · 1 comment

Comments

@jewellsean
Copy link

It would be incredible useful to have some additional run information recorded to ensure results are reproducible. Sumatra, does a good job at deciding which details to log.

In particular, storing the SHA-1 for input and config files would help identify modifications from the first execution to first reproduction attempt. Another feature needed for reproducibility, is storing the git head (if the executable code lives in a git repo).

@netj
Copy link
Owner

netj commented Jun 11, 2014

Thanks for your suggestions @jewellsean!

If you decide to store or symlink the code inside the 3X experiment repository (most likely, the program directory), then for every run it keeps a copy of it. So it's already possible to retrieve the exact version of the code use for individual runs, although you need to go through additional steps to find the exact what's the commit in your separate git repository. I agree storing and displaying the git commit id would be more handy, but it seems many people simply want to run code without committing (or more precisely, without committing their modifications/tweaks), or they have their own version control policy. So we decided to make 3X keep all the copies, which works independent of the underlying version control system.

Identify the best practices for reproducibility and letting 3X provide a good default using git to record as much as possible would be still very interesting and important. In relation to managing and evolving the metadata/schema for an experiment (the inputs, outputs, and the program code) I'm thinking of tightly coupling 3X with git, but it's still an open question how exactly we should do it. Please comment if you have more specific ideas or examples of how you would want to use 3X in your workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants