[dagster-k8s] working op mutating executor #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary & Motivation
Currently you aren't allowed to dynamically orchestrate a new op in a job executor from a prior ops output. This is because Dagster has a pretty hard delineation between a
StepOrchestrationContext
and aStepExecutionContext
. I initially, simply wanted to pass in a customStepLauncher
as a resource that'd handle this for me. That level of abstraction moving forward would probably be the most sane way to implement this feature. Unfortunately, that would require a much more intrusive rewrite of the currentK8sStepHandler
as opposed to the more simple inheritance I have here. I'm also unsure as to whether that's possible. To avoid multiple, potentially expensive io_manager calls, we cache step inputs. That way, while a step is running and polling_get_container_context
we don't reinitialize on every call. We save memory here by dropping the entry once the step is terminated. I considered the following 3 approaches to this problem:1. Downstream
K8sDownstreamOpSpecPipe
ResourceI created the
K8sDownstreamOpSpecPipe
which would have an upstream op name/identify it's downstream steps and use this resource to log the configs it wanted to pass on. Here is a simple example of how this worked:Under the hood, this would emmit an
DagsterEngineEvent
that stored the k8s configs asEngineEventData
. It would store the log on the downstream step itself, so querying for the configwould be super fast for the step handler. I created some custom queries to efficiently query the event log for step configs. The way I arrived at this solution was that I initially was going
to create a custom schema for passing down op to my mutating executor. As I developed the schema, I noticed it looked a lot like the event schema which is pretty generalizable.
Problems:
DagsterEventType
is probably a postgres enum under the hood, so I couldn't extend it how I want to for efficient log queries.2. Output Metadata Consumption
This is very similar to the approach I went with. Essentially, use output metadata if present, to configure the downstream op.
Because we wouldn't want to eagerly load resources we didn't need, I checked the op tags and only eagerly loaded parent steps.
The main problem here was that after a while of hacking with it, I couldn't cleanly get output metadata from an input context.
I had to do some hacky nonsense to get the metadata from the event log by way of output handles and log queries.
As such, I marked this a failure and moved on.
3. Custom Output Wrapper Type (Implemented)
I created a custom dagster type called
K8sOpMutatingOutput
. This is a wrapper that can accept a value and a k8s config.The advantage with this method is that I don't need any extra configs in my op tag. As soon as an op consumes
this type of input, I can recognize it, and naturally resolve runtime configs in a way that's immediately apparent
on job definition (by observing an ops inputs).
Problems:
We fake a step execution context, which means required resources for the op are loaded.
if these resources have destructive side effects, then this could lead to unintended consequences
or miscounting/labeling. The step handler now instantiates an execution context with the run launcher pod
not the op run pod. And the resources get instatiated twice.
With this enancement, I'd consider it fair, although technically, the same problem for custom io managers exists.
During my stress tests, I didn't notice a particularly bad hit to step handler performance, but concievably,
this could get bad. Especially, with the implementation as it stands today, where a steps entire execution
context is built for eager loading.
If the value you are wrapping is particularly large, this will impact step handler performance
The reason we can use this is because for the steps we employ this on, our resources don't have destructive
or slow side affects and minimal input payloads.
Other features
We cache K8sContainerContexts. This contributes slightly the the StepHandlers memory overhead, but is well worth
not resolving the same inputs multiple times. A potential enhancement here for outputs that are repeatedly used
is to implement an lru cache for rendering particular op output configs as well.
We get neat logging as well, for confirmation that inputs got picked up. On the op being configured, we get the following log.
How I Tested These Changes
to recreate an accurate step handler context.
kubectl get pod <op run pod> -oyaml
and observe proper propagation of k8s configs.Changelog
Insert changelog entry or "NOCHANGELOG" here.
NEW
(added new feature or capability)BUGFIX
(fixed a bug)DOCS
(added or updated documentation)My goal here is to at least gather feedback from the team on this code and it's potential to be upstreamed.
I'd love to have this live in the dagster repo, but because of it's limitations, understand if it crosses
one too many boundaries. My rationale for why I think this not a violation of responsibility is because of the
precedent dynamic outputs set. If we are ok with a varying amount of ops in our graph, I think we can be ok
if they manipulate the state of said ops. I can see why, after completing this exercise, these might be 2 separate concerns
though. Would love to hear any and all feedback.