Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgBoss.stop doesn't remove active jobs #303

Open
dolegi opened this issue Feb 4, 2022 · 5 comments
Open

pgBoss.stop doesn't remove active jobs #303

dolegi opened this issue Feb 4, 2022 · 5 comments

Comments

@dolegi
Copy link

dolegi commented Feb 4, 2022

Hey,
first off thanks so much for pgboss is an extremely useful library!

when calling pgBoss.stop() and waiting for the stopped event, jobs that take longer than the timeout get stuck in an active state.

What currently happens

We have some singleton jobs that run for between ~10mins up to just over 1hour. So we have set them to only expire after 120 minutes. When we re-deploy our job workers, the active jobs stay in pgboss until they expire, so the job doesn't get re-triggered until the active job (that no worker is working on) expires.

Request

Ideally when re-deploying we can catch the SIGTERM, call pgboss.stop({timeout: x}) which will stop the worker and remove any active jobs.

TL;DR Request

Have pgBoss.stop() delete/update active jobs when the worker stops.

Or should we be manually deleting active jobs, by tracking jobId's and manually updating the pgboss.job table. Is there a recommended way to approach this?

Related issues

#268

Thanks!

@dolegi dolegi changed the title stop doesn't remove active jobs pgBoss.stop doesn't remove active jobs Feb 4, 2022
@timgit
Copy link
Owner

timgit commented Feb 5, 2022

Hey, thanks! I agree with your suggestion, which is pretty similar to the expiration promise that is started along with jobs in the worker. I will look into an ideal way of opting into this.

Also, have you considered listening to SIGTERM in your worker callback function to do your own failure?

@dolegi
Copy link
Author

dolegi commented Feb 7, 2022

Hi tim, thanks for looking into it. We are considering updating the job statuses directly but it feels wrong and against the way to properly work with pgboss.

UPDATE pgboss.jobs SET state = '<abandoned>' where state=active and id in <ids from the instance worker>;

We have to be careful to only update the job ids from the current instance, since other instance workers could still be actively processing jobs.

@StarpTech
Copy link

Hi @timgit any updates on this?

@timgit
Copy link
Owner

timgit commented Aug 30, 2023

No work is being planned for this request right now. There is a reason SQS doesn't allow you to hold on to a message for hours, first of all. But long-running promises aside, I think the best approach would be to fail the jobs after the timeout. They would be eligible for retry at that point by another worker.

@timgit
Copy link
Owner

timgit commented Aug 31, 2023

I'll consider adding this into v10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants