Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix negative expires_ms and avoid worker freezing while using cron #479

Merged
merged 4 commits into from
Jan 6, 2025

Conversation

Matvey-Kuk
Copy link
Contributor

@Matvey-Kuk Matvey-Kuk commented Sep 11, 2024

We're encouraging issue with negative expires_ms while excessively using frequent cron jobs. It's happening rarely, but is causing worker freezes which is pretty annoying in production.

The issue is also mentioned here: #447

To reproduce:

  1. Schedule a cron task.
  2. Put a debugger breakpoint to https://github.com/python-arq/arq/blob/main/arq/worker.py#L752 (emulating some delay here: https://github.com/python-arq/arq/blob/main/arq/worker.py#L724)
  3. It will cause _defer_until to be in the past.
  4. It will cause expires_ms to be negative here: https://github.com/python-arq/arq/blob/main/arq/connections.py#L170
  5. Raises
Task exception was never retrieved
future: <Task finished name='Task-5' coro=<Worker.async_run() done, defined at /Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py:315> exception=ResponseError('Command # 1 (PSETEX arq:job:cron:process_background_ai_task:1726036281123 -319 �\x04�H\x00\x00\x00\x00\x00\x00\x00}�(�\x01t�N�\x01f��\x1fcron:process_background_ai_task��\x01a�)�\x01k�}��\x02et��\x06r��ߑ\x01u.) of pipeline caused error: ("invalid expire time in \'psetex\' command",)')>
Traceback (most recent call last):
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py", line 320, in async_run
    await self.main_task
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py", line 361, in main
    await self._poll_iteration()
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py", line 401, in _poll_iteration
    await self.heart_beat()
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py", line 727, in heart_beat
    await self.run_cron(now, cron_window_size)
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py", line 759, in run_cron
    job_futures and await asyncio.gather(*job_futures)
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/connections.py", line 177, in enqueue_job
    await pipe.execute()
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/redis/asyncio/client.py", line 1399, in execute
    return await conn.retry.call_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/redis/asyncio/retry.py", line 59, in call_with_retry
    return await do()
           ^^^^^^^^^^
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/redis/asyncio/client.py", line 1287, in _execute_transaction
    self.raise_first_error(commands, response)
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/redis/asyncio/client.py", line 1326, in raise_first_error
    raise r
redis.exceptions.ResponseError: Command # 1 (PSETEX arq:job:cron:process_background_ai_task:1726036281123 -319 ��H}�(�t�N�f��cron:process_background_ai_task��a�)�k�}��et��r��ߑu.) of pipeline caused error: ("invalid expire time in 'psetex' command",)
  1. Worker will stay in the "frozen" state not executing tasks.

Thank you for the library! ❤️

@Matvey-Kuk Matvey-Kuk marked this pull request as draft September 11, 2024 06:44
@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #479      +/-   ##
==========================================
- Coverage   96.27%   96.01%   -0.27%     
==========================================
  Files          11       11              
  Lines        1074     1078       +4     
  Branches      209      190      -19     
==========================================
+ Hits         1034     1035       +1     
- Misses         19       21       +2     
- Partials       21       22       +1     
Files with missing lines Coverage Δ
arq/worker.py 97.17% <ø> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f752e2...c7ab742. Read the comment docs.

@Matvey-Kuk Matvey-Kuk marked this pull request as ready for review September 11, 2024 07:05
@Matvey-Kuk
Copy link
Contributor Author

@Tinche @JonasKs I found you authored nearby code, wonder if you could review this, thank you :)

@Matvey-Kuk
Copy link
Contributor Author

@samuelcolvin ? :)

@Matvey-Kuk
Copy link
Contributor Author

@chrisguidry 👋

@chrisguidry
Copy link
Contributor

Thanks @Matvey-Kuk! I'll work on getting a release ready

@chrisguidry chrisguidry merged commit 3bacb07 into python-arq:main Jan 6, 2025
11 checks passed
@Matvey-Kuk
Copy link
Contributor Author

Thank you @chrisguidry ! Moving Arq back to prod <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants