Skip to content

Commit

Permalink
[DOC-353] Alerts in Dagster+
Browse files Browse the repository at this point in the history
  • Loading branch information
OwenKephart committed Aug 27, 2024
1 parent 2d7d4f1 commit 986b640
Show file tree
Hide file tree
Showing 18 changed files with 826 additions and 2 deletions.
321 changes: 319 additions & 2 deletions docs/docs-beta/docs/dagster-plus/deployment/alerts.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,322 @@
---
title: "Dagster+ alerts"
title: Setting up alerts on Dagster+
sidebar_position: 30
sidebar_label: "Dagster+ Alerts"
---
[comment]: <> (This file is automatically generated by `docs_beta_snippets/guides/monitor-alert/alerting/generate.py`)

# Dagster+ alerts
Dagster+ allows you to configure alerts to automatically fire in response to a range of events. These alerts can be sent to a variety of different services, depending on your organization's needs.

These alerts can be configured in the Dagster+ UI, or using the `dagster-cloud` CLI tool.

<details>
<summary>Prerequisites</summary>
- **Organization**, **Admin**, or **Editor** permissions on Dagster+
</details>

## Configuring a notification service

To start, you'll need to configure a service to send alerts. Dagster+ current supports sending alerts through email, Microsoft Teams, PagerDuty, and Slack.

<Tabs groupId="notification_service">
<TabItem value='email' label='Email'>
No additional configuration is required to send emails from Dagster+.

All alert emails will be sent by `[email protected]`. Alerts can be configured to be sent to any number of emails.
</TabItem>
<TabItem value='microsoft_teams' label='Microsoft Teams'>
Create an incoming webhook by following the [Microsoft Teams documentation](https://learn.microsoft.com/en-us/microsoftteams/platform/webhooks-and-connectors/how-to/add-incoming-webhook?tabs=newteams%2Cdotnet).

This will provide you with a **webhook URL** which will be required when configuring alerts in the UI (after selecting "PagerDuty" as your Notification Service) or using the CLI (in the `notification_service` configuration).

</TabItem>
<TabItem value='pagerduty' label='PagerDuty'>
:::note
You will need sufficient permissions in PagerDuty to add or edit services.
:::

In PagerDuty, you can either:

- [Create a new service](https://support.pagerduty.com/main/docs/services-and-integrations#create-a-service), and add Dagster+ as an integration, or
- [Edit an existing service](https://support.pagerduty.com/main/docs/services-and-integrations#edit-service-settings) to include Dagster+ as an integration

When configuring the integration, choose **Dagster+** as the integration type, and choose an integration name in the format `dagster-plus-{your_service_name}`.

After adding your new integration, you will be taken to a screen containing an **Integration Key**. This value will be required when configuring alerts in the UI (after selecting "Microsoft Teams" as your Notification Service) or using the CLI (in the `notification_service` configuration).

</TabItem>
<TabItem value='slack' label='Slack'>
:::note
You will need sufficient permissions in Slack to add apps to your workspace.
:::
Navigate to **Deployment > Alerts** in the Dagster+ UI and click **Connect to Slack". From there, you can complete the installation process.

When setting up an alert, you can choose a Slack channel to send those alerts to. Make sure to invite the `@Dagster+` bot to any channel that you'd like to receive an alert in.

</TabItem>
</Tabs>

## Alerting when a run fails
You can set up alerts to notify you when a run fails.

By default, these alerts will target all runs in the deployment, but they can be scoped to runs with a specific tag.
<Tabs groupId="ui_or_code">
<TabItem value='ui' label='In the UI'>
1. In the Dagster UI, click **Deployment**.
2. Click the **Alerts** tab.
3. Click **Add alert policy**.
4. Select **Run alert** from the dropdown.

5. Select **Job failure**.

If desired, add **tags** in the format `{key}:{value}` to filter the runs that will be considered.

</TabItem>
<TabItem value='code' label='In code'>
Execute the following command to sync the configured alert policy to your Dagster+ deployment.

```bash
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml
```
<Tabs groupId="notification_service">
<TabItem value='email' label='Email'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-email.yaml" language="yaml" />
</TabItem>
<TabItem value='microsoft_teams' label='Microsoft Teams'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-microsoft_teams.yaml" language="yaml" />
</TabItem>
<TabItem value='pagerduty' label='PagerDuty'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-pagerduty.yaml" language="yaml" />
</TabItem>
<TabItem value='slack' label='Slack'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-slack.yaml" language="yaml" />
</TabItem>
</Tabs>

</TabItem>
</Tabs>

## Alerting when a run is taking too long to complete
You can set up alerts to notify you whenever a run takes more than some threshold amount of time.

By default, these alerts will target all runs in the deployment, but they can be scoped to runs with a specific tag.
<Tabs groupId="ui_or_code">
<TabItem value='ui' label='In the UI'>
1. In the Dagster UI, click **Deployment**.
2. Click the **Alerts** tab.
3. Click **Add alert policy**.
4. Select **Run alert** from the dropdown.

5. Select **Job running over** and a number of hours to alert after.

Check warning on line 110 in docs/docs-beta/docs/dagster-plus/deployment/alerts.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.wordiness] Specify the number or remove the phrase. Raw Output: {"message": "[Dagster.wordiness] Specify the number or remove the phrase.", "location": {"path": "docs/docs-beta/docs/dagster-plus/deployment/alerts.md", "range": {"start": {"line": 110, "column": 36}}}, "severity": "INFO"}

If desired, add **tags** in the format `{key}:{value}` to filter the runs that will be considered.

</TabItem>
<TabItem value='code' label='In code'>
Execute the following command to sync the configured alert policy to your Dagster+ deployment.

```bash
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml
```
<Tabs groupId="notification_service">
<TabItem value='email' label='Email'>
<CodeExample filePath="dagster-plus/deployment/alerts/job-running-over-one-hour-email.yaml" language="yaml" />
</TabItem>
<TabItem value='microsoft_teams' label='Microsoft Teams'>
<CodeExample filePath="dagster-plus/deployment/alerts/job-running-over-one-hour-microsoft_teams.yaml" language="yaml" />
</TabItem>
<TabItem value='pagerduty' label='PagerDuty'>
<CodeExample filePath="dagster-plus/deployment/alerts/job-running-over-one-hour-pagerduty.yaml" language="yaml" />
</TabItem>
<TabItem value='slack' label='Slack'>
<CodeExample filePath="dagster-plus/deployment/alerts/job-running-over-one-hour-slack.yaml" language="yaml" />
</TabItem>
</Tabs>

</TabItem>
</Tabs>

## Alerting when an asset fails to materialize
You can set up alerts to notify you when an asset materialization attempt fails.

By default, these alerts will target all assets in the deployment, but they can be scoped to a specific asset or group of assets.
<Tabs groupId="ui_or_code">
<TabItem value='ui' label='In the UI'>
1. In the Dagster UI, click **Deployment**.
2. Click the **Alerts** tab.
3. Click **Add alert policy**.
4. Select **Asset alert** from the dropdown.

5. Select **Failure** under the **Materializations** heading.

If desired, select a **target** from the dropdown menu to scope this alert to a specific asset or group.

</TabItem>
<TabItem value='code' label='In code'>
Execute the following command to sync the configured alert policy to your Dagster+ deployment.

```bash
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml
```
<Tabs groupId="notification_service">
<TabItem value='email' label='Email'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-email.yaml" language="yaml" />
</TabItem>
<TabItem value='microsoft_teams' label='Microsoft Teams'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-microsoft_teams.yaml" language="yaml" />
</TabItem>
<TabItem value='pagerduty' label='PagerDuty'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-pagerduty.yaml" language="yaml" />
</TabItem>
<TabItem value='slack' label='Slack'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-slack.yaml" language="yaml" />
</TabItem>
</Tabs>

</TabItem>
</Tabs>

## Alerting when an asset check fails
You can set up alerts to notify you when an asset check on an asset fails.

By default, these alerts will target all assets in the deployment, but they can be scoped to checks on a specific asset or group of assets.
<Tabs groupId="ui_or_code">
<TabItem value='ui' label='In the UI'>
1. In the Dagster UI, click **Deployment**.
2. Click the **Alerts** tab.
3. Click **Add alert policy**.
4. Select **Asset alert** from the dropdown.

5. Select **Failed (ERROR)** under the **Asset Checks** heading.

If desired, select a **target** from the dropdown menu to scope this alert to a specific asset or group.

</TabItem>
<TabItem value='code' label='In code'>
Execute the following command to sync the configured alert policy to your Dagster+ deployment.

```bash
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml
```
<Tabs groupId="notification_service">
<TabItem value='email' label='Email'>
<CodeExample filePath="dagster-plus/deployment/alerts/asset-check-failed-email.yaml" language="yaml" />
</TabItem>
<TabItem value='microsoft_teams' label='Microsoft Teams'>
<CodeExample filePath="dagster-plus/deployment/alerts/asset-check-failed-microsoft_teams.yaml" language="yaml" />
</TabItem>
<TabItem value='pagerduty' label='PagerDuty'>
<CodeExample filePath="dagster-plus/deployment/alerts/asset-check-failed-pagerduty.yaml" language="yaml" />
</TabItem>
<TabItem value='slack' label='Slack'>
<CodeExample filePath="dagster-plus/deployment/alerts/asset-check-failed-slack.yaml" language="yaml" />
</TabItem>
</Tabs>

</TabItem>
</Tabs>

## Alerting when a schedule or sensor tick fails
You can set up alerts to fire when any schedule or sensor tick across your entire deployment fails.

Alerts are sent only when a schedule/sensor transitions from **success** to **failure**, so only the initial failure will trigger the alert.
<Tabs groupId="ui_or_code">
<TabItem value='ui' label='In the UI'>
1. In the Dagster UI, click **Deployment**.
2. Click the **Alerts** tab.
3. Click **Add alert policy**.
4. Select **Schedule/Sensor alert** from the dropdown.
</TabItem>
<TabItem value='code' label='In code'>
Execute the following command to sync the configured alert policy to your Dagster+ deployment.

```bash
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml
```
<Tabs groupId="notification_service">
<TabItem value='email' label='Email'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-email.yaml" language="yaml" />
</TabItem>
<TabItem value='microsoft_teams' label='Microsoft Teams'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-microsoft_teams.yaml" language="yaml" />
</TabItem>
<TabItem value='pagerduty' label='PagerDuty'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-pagerduty.yaml" language="yaml" />
</TabItem>
<TabItem value='slack' label='Slack'>
<CodeExample filePath="dagster-plus/deployment/alerts/schedule-sensor-failure-slack.yaml" language="yaml" />
</TabItem>
</Tabs>

</TabItem>
</Tabs>

## Alerting when a code location fails to load
You can set up alerts to fire when any code location fails to load due to an error.
<Tabs groupId="ui_or_code">
<TabItem value='ui' label='In the UI'>
1. In the Dagster UI, click **Deployment**.
2. Click the **Alerts** tab.
3. Click **Add alert policy**.
4. Select **Code location error alert** from the dropdown.
</TabItem>
<TabItem value='code' label='In code'>
Execute the following command to sync the configured alert policy to your Dagster+ deployment.

```bash
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml
```
<Tabs groupId="notification_service">
<TabItem value='email' label='Email'>
<CodeExample filePath="dagster-plus/deployment/alerts/code-location-error-email.yaml" language="yaml" />
</TabItem>
<TabItem value='microsoft_teams' label='Microsoft Teams'>
<CodeExample filePath="dagster-plus/deployment/alerts/code-location-error-microsoft_teams.yaml" language="yaml" />
</TabItem>
<TabItem value='pagerduty' label='PagerDuty'>
<CodeExample filePath="dagster-plus/deployment/alerts/code-location-error-pagerduty.yaml" language="yaml" />
</TabItem>
<TabItem value='slack' label='Slack'>
<CodeExample filePath="dagster-plus/deployment/alerts/code-location-error-slack.yaml" language="yaml" />
</TabItem>
</Tabs>

</TabItem>
</Tabs>

## Alerting when a Hybrid agent becomes unavailable
:::note
This is only available for [Hybrid](/todo) deployments.
:::

You can set up alerts to fire if your Hybrid agent hasn't sent a heartbeat in the last 5 minutes.
<Tabs groupId="ui_or_code">
<TabItem value='ui' label='In the UI'>
1. In the Dagster UI, click **Deployment**.
2. Click the **Alerts** tab.
3. Click **Add alert policy**.
4. Select **Code location error alert** from the dropdown.
</TabItem>
<TabItem value='code' label='In code'>
Execute the following command to sync the configured alert policy to your Dagster+ deployment.

```bash
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml
```
<Tabs groupId="notification_service">
<TabItem value='email' label='Email'>
<CodeExample filePath="dagster-plus/deployment/alerts/code-location-error-email.yaml" language="yaml" />
</TabItem>
<TabItem value='microsoft_teams' label='Microsoft Teams'>
<CodeExample filePath="dagster-plus/deployment/alerts/code-location-error-microsoft_teams.yaml" language="yaml" />
</TabItem>
<TabItem value='pagerduty' label='PagerDuty'>
<CodeExample filePath="dagster-plus/deployment/alerts/code-location-error-pagerduty.yaml" language="yaml" />
</TabItem>
<TabItem value='slack' label='Slack'>
<CodeExample filePath="dagster-plus/deployment/alerts/code-location-error-slack.yaml" language="yaml" />
</TabItem>
</Tabs>

</TabItem>
</Tabs>
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# alert_policies.yaml

alert_policies:
alert_targets:
- asset_key_target:
asset_key:
- s3
- report
- asset_group_target:
asset_group: transformed
location_name: prod
repo_name: __repository__
description: Sends an email when an asset check fails.
event_types:
- ASSET_CHECK_SEVERITY_ERROR
name: asset-check-failed-email
notification_service:
email_addresses:
- [email protected]
- [email protected]
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# alert_policies.yaml

alert_policies:
alert_targets:
- asset_key_target:
asset_key:
- s3
- report
- asset_group_target:
asset_group: transformed
location_name: prod
repo_name: __repository__
description: Sends a Microsoft Teams webhook when an asset check fails.
event_types:
- ASSET_CHECK_SEVERITY_ERROR
name: asset-check-failed-microsoft_teams
notification_service:
webhook_url: https://yourdomain.webhook.office.com/...
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# alert_policies.yaml

alert_policies:
alert_targets:
- asset_key_target:
asset_key:
- s3
- report
- asset_group_target:
asset_group: transformed
location_name: prod
repo_name: __repository__
description: Sends a PagerDuty alert when an asset check fails.
event_types:
- ASSET_CHECK_SEVERITY_ERROR
name: asset-check-failed-pagerduty
notification_service:
integration_key: <pagerduty_integration_key>
Loading

0 comments on commit 986b640

Please sign in to comment.