Locking mechanism to prevent two or more workflows/tasks running in parallel #3754
Replies: 9 comments 4 replies
-
@edvardm tgank you for the request, do you think https://docs.flyte.org/projects/cookbook/en/latest/auto/core/flyte_basics/task_cache_serialize.html this works |
Beta Was this translation helpful? Give feedback.
-
That would not work unfortunately for two reasons: one is that I would not want to queue task until another if finished, but prevent execution / skip just started task, if one was running already. Another reason is that using cache would just waste resources here, as I'd pretty much never get hits. So suggestion you made in #267 (comment) is spot on |
Beta Was this translation helpful? Give feedback.
-
I think this is great! Could be really powerful. To add a little more context, the cache serialize work above works on the premise that each task execution is uniquely identified by project / domain / task id / input values / cache version. The serialization of concurrent executions occurs by using this cache key and it works great, only a single instance of a cached task will run concurrently and then others will reuse the cached results rather than computing them separately. This scheme may or may not be extensible to support this use-case based on to scope. A previous proposal (can't seem to find) discussed applying generalized serialization to tasks with something like: IMO scope is the largest unknown here. If we just want to add a simple serialize behavior flag at the task level, this would be pretty simple. However, that solution is not nearly as ambitious as this proposal. I would certainly support an RFC to explore this in more depth and be very happy to help implementation. |
Beta Was this translation helpful? Give feedback.
-
I think most folks want serial execution of scheduled launchplans. This is a much simpler and better problem to solve than global serialization |
Beta Was this translation helpful? Give feedback.
-
Hey!
Use-case 1 (parallel tasks):We have a workflow with a Spark task to write data in S3 bucket. The data is appended/dynamically overwritten.
Problem Use-case 2(concurrent workflows):We have a workflow that contains multiple long-running tasks. The workflow has external dependencies that can arrive with delays and so is scheduled to run ~every 20th minute. Several tasks have different upstream dependencies. For instance, data for Task 1 can arrive earlier than data for Task 2 - this means that as soon as data for task 1 is ready and the schedule kicks-in, Task 1 will start executing, while Task 2 will await for its input. Problem |
Beta Was this translation helpful? Give feedback.
-
@edvardm how do you feel about starting an RFC from this discussion? |
Beta Was this translation helpful? Give feedback.
-
@edvardm from last week's Contributor's meetup this idea still is considered a good fit for an RFC. You could either work on the proposal yourself, nominate someone else, or let us know if you still want to keep this entry open. Thanks for your support so far :) |
Beta Was this translation helpful? Give feedback.
-
We had a similar request and recently implemented this through custom agents (LockingAgent and UnlockingAgent). The lock is tracked in our our database, but we had thought about either using the Flyte admin database in a new schema or etcd. The current implementation simply exits if it can't acquire a lock. We are also looking at another implementation that will act more like a Sensor and wait for it's turn in the queue. My point for bringing this up is that maybe Agents are a way to move forward with this. I'd love to discuss this more with folks if this is still a desired feature. |
Beta Was this translation helpful? Give feedback.
-
10/24/2024 Contributor's sync notes: no active work on this and still needs an owner to write and shepherd the RFC through the process. |
Beta Was this translation helpful? Give feedback.
-
Draft for more elaborate RFC.
There are multiple cases when it is not desirable to have multiple tasks and/or workflows running at the same time. To cover all such cases one way to achieve it would be to use named, distributed locks in such a way that only single process anywhere could hold that lock*
An existing related issue which is very common in ETL pipelines is well described in #267
From users perspective, I wish I had something along these lines:
I'm not yet sure how global it could be; maybe that lock identifier could have by default prefix of the project name. Currently I'm intending to resolve this by using Redlock as distributed lock, likely combined with conditionals in the workflow
*) Or even neater, something like semaphores so that k instances could pass at a time, but simple distributed mutexes would be good enough
Beta Was this translation helpful? Give feedback.
All reactions