Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Wanted to push this early so implementation can be discussed. This required more changes than I anticipated.
It needs many unit tests.
It is blocked from being fully implemented until we can get the other stages to accept the quivr tables as inputs / outputs.
What this adds:
checkpoint_dir
that is used for storing config and checkpointed data files. Intermediate results are written to disk if it is specified.compare_configs
function that checks to make sure a checkpointed instance is using the original config with an override booleanallow_config_override
. Only runs whencheckpoint_dir
is being usedLinkTestOrbitStageResult
which contains references to result[s], a name which describes the stage result, and optional path[s] to the data products on disk. This allows a dynamic caller oflink_test_orbit
to know what type of results are being yielded back, it can analyze the results in memory if it chooses and know that the result files are ready to store elsewhere ifcheckpoint_dir
is being used.load_initial_checkpoint_values
is run near the beginning to check the state of things. It will assign the currentCheckpointData
based on what it sees. This requires adding a control flow tolink_test_orbit
to always check what stage thecheckpoint: CheckpointData
is at. It also requires updating thecheckpoint
after each stage so that the following stages are run.Additional thoughts:
There is a bit of a game of ping pong with
use_ray
and whether we are passing ObjectRef or the objects themselves. Even if we always use ray and get rid of that boolean, there will be some of this. The checkpointing is also going to suffer this a bit as it becomes the main container to move the inputs along the pipeline. We anticipated this and I'm not sure there is a clearly correct solution. I suggest we push forward with it until everything is updated to use quivr tables at the edges and checkpointing is complete, then a pattern will hopefully emerge.