Limiting concurrent jobs #98

colingavin · 2024-08-27T17:15:33Z

colingavin
Aug 27, 2024

For my use case, jobs spawn CPU intensive subprocesses so I need to keep the maximum number of concurrent jobs on each consumer limited.

Currently, I don't think there's a way to impose a hard limit on the number concurrent tasks that can be started for an entrypoint. The batch size parameter can be used to limit the number of tasks started in a single poll of the database, but if tasks take longer than the dequeue_timeout this is not a hard limit. Likewise, the requests_per_second rate limiting can only impose soft limits and can be difficult to tune if jobs have large variability in how long they take to execute.

It is possible in user code to use an asyncio.Semaphore within an entrypoint to manage resource utilization, but this does not impose any back-pressure on dequeuing more jobs from the db. This can lead to situations where one consumer has dequeued many jobs that are waiting for the semaphore while another consumer is idle.

I would like to propose adding an additional concurrency_limit argument to QueueManager.entrypoint. This would:

Create an internal semaphore for each entrypoint in QueueManager and use that semaphore to protect the user's entrypoint function.
Filter the list of entrypoints to dequeue by those whose semaphore is not 100% used. This would be an additional condition to the filtering that is already done for the rate limiting.

I have a proof-of-concept implementation of this idea at colingavin/pgqueuer@604f6b9

janbjorge · 2024-08-27T21:03:51Z

janbjorge
Aug 27, 2024
Maintainer

I've been thinking about something like this myself. My concern is that this needs to happen before the job gets dispatched to the consumers, as they will just start clogging up due to exhaustion of file descriptors.

0 replies

janbjorge · 2024-08-27T21:21:48Z

janbjorge
Aug 27, 2024
Maintainer

Cool, looks like i was wrong. Ive run the below script until i got 1.5e6 tasks. So i prev. statement was wrong 😄

import asyncio

import asyncpg

from pgqueuer.db import AsyncpgDriver
from pgqueuer.models import Job
from pgqueuer.qm import QueueManager
from pgqueuer.queries import Queries


async def main() -> None:
    qm_conn = await asyncpg.connect()

    driver = AsyncpgDriver(qm_conn)
    qm = QueueManager(driver)

    @qm.entrypoint("fetch")
    async def fetch(job: Job) -> None:
        await asyncio.sleep(float("inf"))

    enq_conn = await asyncpg.connect()
    q = Queries(AsyncpgDriver(enq_conn))
    N = 100_000

    async def enqueue() -> None:
        while True:
            await q.enqueue(
                ["fetch"] * N,
                [None] * N,
                [0] * N,
            )
            await asyncio.sleep(0.1)
            print(len(asyncio.all_tasks()))

    await asyncio.gather(enqueue(), qm.run(batch_size=N // 2))


asyncio.run(main())

0 replies

colingavin · 2024-08-27T21:57:09Z

colingavin
Aug 27, 2024
Author

Yeah, asyncio can handle quite a lot of idle tasks, so just having them pile up in the consumers is not a huge problem in itself. For me the issue is (a) the tasks are actually CPU bound (although via a subprocess) and (b) lack of back-pressure means that job distribution between consumers is inefficient & bringing extra consumers up will not help because tasks are already dequeued.

2 replies

janbjorge Aug 28, 2024
Maintainer

I misread your PR the first time, this is really nice. I applied your patch, and made a few minor modifications. Feel free to grab it and submit it as your own, so you get credit ☺️.

#101

colingavin Aug 28, 2024
Author

Awesome! I'm not worried about credit 😄 but I'll clean it up and write test/docs for sure

janbjorge · 2024-08-30T11:43:54Z

janbjorge
Aug 30, 2024
Maintainer

Resolved by #103

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limiting concurrent jobs #98

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Limiting concurrent jobs #98

colingavin Aug 27, 2024

Replies: 4 comments · 2 replies

janbjorge Aug 27, 2024 Maintainer

janbjorge Aug 27, 2024 Maintainer

colingavin Aug 27, 2024 Author

janbjorge Aug 28, 2024 Maintainer

colingavin Aug 28, 2024 Author

janbjorge Aug 30, 2024 Maintainer

colingavin
Aug 27, 2024

Replies: 4 comments 2 replies

janbjorge
Aug 27, 2024
Maintainer

janbjorge
Aug 27, 2024
Maintainer

colingavin
Aug 27, 2024
Author

janbjorge Aug 28, 2024
Maintainer

colingavin Aug 28, 2024
Author

janbjorge
Aug 30, 2024
Maintainer