run worker pod getting interrupted intermittently during a run #17935
Replies: 1 comment
-
Dupe of #17934 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, we’re experiencing weird issues since we upgraded Dagster to the latest version.
Some jobs fails semi randomly after like 30 seconds with
Multiprocess executor: received termination signal - forwarding to active child processes
My current hypothesis is that it’s related to a scale up issue, because the node doesn’t have enough resources. The
kubectl
describe showsWarning FailedScheduling 2m57s (x2 over 3m) default-scheduler 0/3 nodes are available: 3 Insufficient cpu, 3 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
Describing the job shows
Pods Statuses: 0 Active (0 Ready) / 1 Succeeded / 1 Failed
I think there is a bug somewhere. I tried setting a
backoffLimit
but the issue remains.There is a thread that I think is related to this issue: #6236
Do you have any idea what could be happening here? Thanks in advance
The question was originally asked in Dagster Slack.
Beta Was this translation helpful? Give feedback.
All reactions