-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Kedro Deployment #4317
Comments
Back in 2022, a small team and I pitched (and prototyped) Exedra (probably misspelled Exidra) to provide an intermediate representation for deploying to orchestrators, to help solve the issue of maintaining many deployment plugins. It's probably somewhat along the lines of some of the discussions around separating the node grouping and the deployment pieces. In that case, Kedro-Exedra would manage node grouping, while Exedra handles deployment. It looked at and leveraged the structural similarities between a lot of the (then-modern) ML workflow tools, including showing Kubeflow Pipelines/Vertex AI, Azure ML, and Sagemaker. IIRC I also made pretty clear claims that you should just deploy at the modular pipeline level, and that nobody needs to deploy per-node. There have also been other attempts to abstract the second part, and to create a more unified deployment language, such as Couler. This could have also helped easy the maintenance burden, but it also never took off. (Also, while the project creator was very open to collaborating and having people add other orchestration backends/generally improving the project, there wasn't buy-in to invest in this from QB.) Old related issue: #2058 This was based on what IMO was the best way to deploy almost 3 years ago now. If it is still important to be able to deploy to all of these, it's probably still a good starting point. It's also worth noting that a lot of these focus on the data science side, not the data engineering side. On top of that, there's a question of what is the best way to deploy a Kedro pipeline (e.g. as part of a data platform or broader ecosystem). Let's say you're an open-source user, and you want to be able to productionize your pipeline, and you have no strong requirements around what tool to use to do that. I think this is more of an open question; also happy to hear if others have seen a best option in some of these cases. For example, @gtauzin recently went through this journey, but I'm sure many others have, too. It's probably also worth looking at a lot of the existing work in this space. For example, especially in the data engineering space, what is the best way to deploy something like dbt? What do Airflow, Dagster, , etc. deployments look like for these? Do any stand out? What are the killer features? Pieces of discussion with @astrojuanlu:
Yeah, it's not really being worked on. Realistically, to my chagrin, most people don't need something like Couler. I actually brought up my interest in this space in my past work creating unifying abstractions, but was told that orchestrators are all kinda the same, and people just pick one and it works; it's probably true to some extent... (Even for Couler, it basically just is build for Argo, because that's what the people using it use—so much for a unifying abstraction. :rolling_on_the_floor_laughing:)
Yes, 100% agree that this is a challenge, and it's also like the one case where you want that Exedra- or Couler-esque abstraction.
It's also worth noting, if it doesn't make sense to deploy dbt to one of these tools, it quite possibly also doesn't make sense to deploy Kedro data engineering pipelines there. If Kedro is going to support these workflows, then you need to be able to tell the deployment story across DE and DS—will it require multiple deployment tools and plugins, or do you focus on those that support the full story like Airflow, Dagster, maybe Vertex AI? (Quick search I'm not seeing great resources on deploying dbt in Sagemaker or Azure ML Pipelines, but correct me if I'm missing something) |
Overview
This parent issue tracks our ongoing efforts to improve Kedro deployment. Based on user research, we aim to address key challenges by enhancing plugins, refining documentation, and developing new features to better support our community's deployment needs.
Research Initiatives
We began our research in October 2024 with a user survey and follow-up interviews:
Key Insights and Challenges
Next Steps
We will continue to address these insights through targeted improvements and new feature development. This issue will track the progress of all related tasks and discussions, with updates and deliverables shared as they are completed.
Feel free to contribute, discuss, or raise additional concerns related to Kedro deployment in the comments below.
The text was updated successfully, but these errors were encountered: