-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft Common Message Queue #46694
base: main
Are you sure you want to change the base?
Draft Common Message Queue #46694
Conversation
Here is a very early draft PR to introduce and socialize the concept of a "common message queue" abstraction similar to the "Common SQL" and "Common IO" abstractions in Airflow. This will be a provider package similar to those and is intended to be an abstraction over Apache Kafka, Amazon SQL, and Google PubSub to begin with. It can then be expanded to other messaging systems based on community adoption. The initial goal with this is to provide a simple abstraction for integrating Event Driven Scheduling coming with Airflow 3 to message notification systems such as Kafka, currently being used to publish data availability. At this stage, this is very much a WIP draft intended to solicit input from the community.
Updated the Common Message Queue Readme with an example of an Event Driven Dag
Updated the message queue Operator and Sensor to fix an issue in my sync
Changed the Message Queue Sensor Operator to be a Deferrable Trigger
Fixed typos and import errors in the MsgQueueHook
Implementation wise, here is my thinking. I am starting by Given |
Updated invocation of MsqQueueSensorTrigger to MsgQueueTrigger in example invocation
You are right Vincent. I did think about the "Composition vs. Inheritance" approach tradeoff. The composition style interface as defined here is easier for the DAG author, but more maintenance for us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good. Some more nit that I would have on the Python code/Interface but we can leave this until it is in real review.
Would be great to add an example DAG as well for the showcase.
providers/common/msgq/src/airflow/providers/common/msgq/operators/msg_queue.py
Outdated
Show resolved
Hide resolved
I am iterating on that PR but the new provider is not recognized. I get:
With the new restructure, what is the process to add a new provider? Do I just need to create |
I updated the PR. I focused only on the trigger side. Please let me know if this is what you had in mind in terms of implementation regarding the trigger. I really see it as a proxy of the provider triggers. I could not test it because the new provider is not recognized but once that solved I should be able to test it. |
You need to look at the main And yes I updated https://github.com/apache/airflow/blob/main/providers/MANAGING_PROVIDERS_LIFECYCLE.rst#creating-a-new-community-provider - with the new structure and how to add a new provider, but that part is likely missing so after you figure it out, PRs there are most welcome. BTW. It will likely slightly change in the future as we will move airflow-core and others, but still it would be great to keep it updated. |
Generally @vincbeck -> look at everything below |
Thank you :D |
06d84db
to
faf4fb0
Compare
faf4fb0
to
b665607
Compare
I am not sure what is going on 👀
|
..... | ||
|
||
.. note:: | ||
This release of provider is only available for Airflow 2.9+ as explained in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the 3.0+ requirement.
task = EmptyOperator(task_id="task") | ||
|
||
chain(task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task = EmptyOperator(task_id="task") | |
chain(task) | |
EmptyOperator(task_id="task") |
No need for a var and chain if there is a single task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a little weird being in the test dir?
Here is a very early draft PR to introduce and socialize the concept of a "common message queue" abstraction similar to the "Common SQL" and "Common IO" abstractions in Airflow.
This will be a provider package similar to those and is intended to be an abstraction over Apache Kafka, Amazon SQL, and Google PubSub to begin with. It can then be expanded to other messaging systems based on community adoption.
The initial goal with this is to provide a simple abstraction for integrating Event Driven Scheduling coming with Airflow 3 to message notification systems such as Kafka, currently being used to publish data availability.
At this stage, this is very much a WIP draft intended to solicit input from the community.