-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add liveness probe for celery workers #766
feat: add liveness probe for celery workers #766
Conversation
Currently pods can enter a 'zombie' state where they become disconnected from celery and will no longer pick up new jobs. We create a liveness probe to detect such pods so they can be killed by k8s and recreated Signed-off-by: Nick Wood <[email protected]>
40dbe4e
to
e1a5afa
Compare
@thesuperzapper this one comes from our org, bringing a change that helped us upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Mathew Wicks <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickwood thanks for this!
I have made a significant change in 20e7546 to rewrite the probe in Python, so it should now be ready to include in the next release of the chart.
This is because the approach that you proposed did not work for all versions of Airflow (for example it did not work in Airflow 2.7.0 because the --app
name changed).
Signed-off-by: Mathew Wicks <[email protected]>
FYI, for anyone who has airflow worker liveness probe failure |
Could we set this `AIRFLOW__CELERY__WORKER_ENABLE_REMOTE_CONTROL` field as
the default since the liveness probe is also default behavior?
…On Fri, Nov 3, 2023 at 2:58 AM Nguyễn Lê Huy ***@***.***> wrote:
FYI, for anyone who has airflow worker liveness probe failure TypeError:
argument of type 'NoneType' is not iterable.
This is because of celery inspect can not see the worker running.
We need to set the environment variable
AIRFLOW__CELERY__WORKER_ENABLE_REMOTE_CONTROL=True instead of False to
make the celery inspect ping works.
—
Reply to this email directly, view it on GitHub
<#766 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASBCZMVLVPYNAOCDGCD5L3YCSP3VAVCNFSM6AAAAAA3BOPBBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJSGAYDSNZSGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
It's default 'true' in airflow config: https://airflow.apache.org/docs/apache-airflow-providers-celery/stable/configurations-ref.html#worker-enable-remote-control |
What issues does your PR fix?
What does your PR do?
Currently celery workers can enter a 'zombie' state where they become disconnected from celery and will no longer pick up new jobs.
This PR adds a iveness probe (enabled by default) to detect such pods so they can be killed by k8s and recreated.
Checklist
For all Pull Requests