You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use-case: Experiment terminates, now I want to resume training from a saved checkpoint. The Experiment Directory should now be the same as the initial run since ideally model checkpoints, (and for example tensorflow summaries) are within the previous experiment's folder
Terminal Print out either attatched to previous output.txt or an output2.txt created, etc...
The text was updated successfully, but these errors were encountered:
One way would be to make extend a PersistentExperiment base class with an __init__ and an abstract "run_step" method. It's "run" method would repeatedly call run_step while either periodically saving checkpoints or catching keyboard interrupts and saving from there.
A small obstacle is that everything in your experiment needs to be picklable. (so no lambda functions, etc)
Use-case: Experiment terminates, now I want to resume training from a saved checkpoint. The Experiment Directory should now be the same as the initial run since ideally model checkpoints, (and for example tensorflow summaries) are within the previous experiment's folder
Terminal Print out either attatched to previous output.txt or an output2.txt created, etc...
The text was updated successfully, but these errors were encountered: