Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeLimitExceeded failures are not recorded to results table #367

Open
ericswpark opened this issue Feb 25, 2023 · 8 comments
Open

TimeLimitExceeded failures are not recorded to results table #367

ericswpark opened this issue Feb 25, 2023 · 8 comments

Comments

@ericswpark
Copy link

ericswpark commented Feb 25, 2023

When a task fails for exceeding the specified time limit, theFAILURE state is not saved to the results table.

I have to manually check the status of the task using a tool like flower to see if tasks have failed or not.

This may be related to issue #37, as I use update_state() to send back progress data to Django.

Here is the relevant code snippet:

https://github.com/shipperstack/shipper/blob/master/core/tasks.py#L101-L103

@auvipy
Copy link
Member

auvipy commented Mar 1, 2023

can you check this PR #366

@ericswpark
Copy link
Author

@auvipy I'll test the PR once it makes it into a release.

@ericswpark
Copy link
Author

@auvipy after testing with 2.5.0 I am still getting the same behavior, unfortunately.

Flower says the task has failed with a TimeLimitExceeded exception:

CAD02D90-BA48-43BA-B880-36837E2614A4

However on the results page it still shows PROGRESS for the task state:

4F1F7366-6E0B-493B-9AB3-27862897E20A

@auvipy
Copy link
Member

auvipy commented Mar 20, 2023

I suggest to dig deeper into the issue and find out the root cause

@ericswpark
Copy link
Author

@auvipy I tried looking through django-celery-results code, but I don't understand how it works. Does Celery workers/tasks send results to django-celery-results using some sort of callback, or does django-celery-results poll the workers/tasks for the current status/progress?

If it's the former, I'm thinking one of the following:

  1. Celery task fails but the worker is not reporting the failure to django-celery-results for some reason.
  2. The progress update function is called, then the task fails. Celery reports to django-celery-results of the failure, which is recorded to the database. Then the earlier progress update function finally finishes updating the database, overwriting the FAILURE status. But I highly doubt a race condition is the issue here, as nearly all of the tasks do not show a FAILURE status.
  3. Tasks launched with django-celery-beat are not properly reflected in the results table. Only the task entry is created and the SUCCESS/FAILURE status is not.

Any ideas here would be appreciated!

@ericswpark
Copy link
Author

I figured out that any invocations of the progress state update will cause the state of the task to get "stuck" within the results table:

https://github.com/shipperstack/shipper/blob/master/core/tasks.py#L101-L107

I think that this is because paramiko (the SFTP library used within the task) is asynchronous, and when the timeout is reached the last call to the state update is made after the FAILURE result is recorded to the database.

I guess I could have the callback check if the state is in FAILURE before updating, but am not sure if that won't have a race condition as well. Ideally it should not overwrite a SUCCESS/FAILURE status.

@ericswpark
Copy link
Author

@auvipy would it be possible to have django-celery-results await all asynchronous callbacks before writing the FAILURE result to the database? On Flower, the FAILURE result stemming from a TimeLimitExceeded exception is shown regardless of the state update being enabled or not.

I've tried alternatives, like not setting the state and only setting the meta information, but it seems like that will not work because the state then gets set to null. And as mentioned before, checking the state in the async function is also problematic since there could always be a race condition.

@auvipy
Copy link
Member

auvipy commented Mar 28, 2023

problem is none of the celery packages support async await as of yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants