Skip to content

Commit

Permalink
Add pod logging (dagster-io#26196)
Browse files Browse the repository at this point in the history
## Summary & Motivation
This is a follow up of dagster-io#22784
Such logging is still necessary to have even after
`delete_failed_k8s_jobs` implementation, because such option isn't
feasible in production or/and highly dynamic environments.

Addressed previous comments of @gibsondan 

## How I Tested These Changes
Tested locally by inducing an error. Logging was successful as expected.

## Changelog

Output k8s pod logs when pods fail.
  • Loading branch information
apetryla authored and pskinnerthyme committed Dec 16, 2024
1 parent a2a585a commit ebe1df7
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions python_modules/libraries/dagster-k8s/dagster_k8s/ops/k8s_job_op.py
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,20 @@ def execute_k8s_job(
num_pods_to_wait_for=num_pods_to_wait_for,
)
except (DagsterExecutionInterruptedError, Exception) as e:
try:
pods = api_client.get_pod_names_in_job(job_name=job_name, namespace=namespace)
pod_debug_info = "\n\n".join(
[api_client.get_pod_debug_info(pod_name, namespace) for pod_name in pods]
)
except Exception:
context.log.exception(
f"Error trying to get pod debug information for failed k8s job {job_name}"
)
else:
context.log.error(
f"Debug information for failed k8s job {job_name}:\n\n{pod_debug_info}"
)

if delete_failed_k8s_jobs:
context.log.info(
f"Deleting Kubernetes job {job_name} in namespace {namespace} due to exception"
Expand Down

0 comments on commit ebe1df7

Please sign in to comment.