run worker pod getting interrupted intermittently during a run #17936

2023-11-12T23:12:14Z

dagsir[bot]
bot Nov 12, 2023

Hi, we’re experiencing weird issues since we upgraded Dagster to the latest version.

Some jobs fails semi randomly after like 30 seconds with Multiprocess executor: received termination signal - forwarding to active child processes

My current hypothesis is that it’s related to a scale up issue, because the node doesn’t have enough resources. The kubectl describe shows
Warning FailedScheduling 2m57s (x2 over 3m) default-scheduler 0/3 nodes are available: 3 Insufficient cpu, 3 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
Describing the job shows Pods Statuses: 0 Active (0 Ready) / 1 Succeeded / 1 Failed

I think there is a bug somewhere. I tried setting a backoffLimit but the issue remains.

There is a thread that I think is related to this issue: #6236

Do you have any idea what could be happening here? Thanks in advance

The question was originally asked in Dagster Slack.

gibsondan · 2023-11-13T00:16:58Z

gibsondan
Nov 13, 2023
Maintainer

Dupe of #17934

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run worker pod getting interrupted intermittently during a run #17936

{{title}}

Replies: 1 comment

{{title}}

Select a reply

run worker pod getting interrupted intermittently during a run #17936

dagsir[bot] bot Nov 12, 2023

Replies: 1 comment

gibsondan Nov 13, 2023 Maintainer

dagsir[bot]
bot Nov 12, 2023

gibsondan
Nov 13, 2023
Maintainer