-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
52c25d2
commit 65934c7
Showing
18 changed files
with
824 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,322 @@ | ||
--- | ||
title: "Setting up alerts" | ||
title: Setting up alerts on Dagster+ | ||
sidebar_position: 30 | ||
sidebar_label: "Alerting" | ||
sidebar_label: "Alerting on Dagster+" | ||
--- | ||
[comment]: <> (This file is automatically generated by `docs_beta_snippets/guides/monitor-alert/alerting/generate.py`) | ||
|
||
Alerting if my pipeline didn't execute | ||
Tracking when a run or sensor fails | ||
Knowing when a pipeline never ran | ||
Knowing if a pipeline is running slow, or an asset is late | ||
Knowing if my Dagster instance is having issues | ||
Dagster+ allows you to configure alerts to automatically fire in response to a range of events. These alerts can be sent to a variety of different services, depending on your organization's needs. | ||
|
||
These alerts can be configured in the Dagster+ UI, or using the `dagster-cloud` CLI tool. | ||
|
||
<details> | ||
<summary>Prerequisites</summary> | ||
- **Organization**, **Admin**, or **Editor** permissions on Dagster+ | ||
</details> | ||
|
||
## Configuring a notification service | ||
|
||
To start, you'll need to configure a service to send alerts. Dagster+ current supports sending alerts through email, Microsoft Teams, PagerDuty, and Slack. | ||
|
||
<Tabs groupId="notification_service"> | ||
<TabItem value='email' label='Email'> | ||
No additional configuration is required to send emails from Dagster+. | ||
|
||
All alert emails will be sent by `[email protected]`. Alerts can be configured to be sent to any number of emails. | ||
</TabItem> | ||
<TabItem value='microsoft_teams' label='Microsoft Teams'> | ||
Create an incoming webhook by following the [Microsoft Teams documentation](https://learn.microsoft.com/en-us/microsoftteams/platform/webhooks-and-connectors/how-to/add-incoming-webhook?tabs=newteams%2Cdotnet). | ||
|
||
This will provide you with a **webhook URL** which will be required when configuring alerts in the UI (after selecting "PagerDuty" as your Notification Service) or using the CLI (in the `notification_service` configuration). | ||
|
||
</TabItem> | ||
<TabItem value='pagerduty' label='PagerDuty'> | ||
:::note | ||
You will need sufficient permissions in PagerDuty to add or edit services. | ||
::: | ||
|
||
In PagerDuty, you can either: | ||
|
||
- [Create a new service](https://support.pagerduty.com/main/docs/services-and-integrations#create-a-service), and add Dagster+ as an integration, or | ||
- [Edit an existing service](https://support.pagerduty.com/main/docs/services-and-integrations#edit-service-settings) to include Dagster+ as an integration | ||
|
||
When configuring the integration, choose **Dagster+** as the integration type, and choose an integration name in the format `dagster-plus-{your_service_name}`. | ||
|
||
After adding your new integration, you will be taken to a screen containing an **Integration Key**. This value will be required when configuring alerts in the UI (after selecting "Microsoft Teams" as your Notification Service) or using the CLI (in the `notification_service` configuration). | ||
|
||
</TabItem> | ||
<TabItem value='slack' label='Slack'> | ||
:::note | ||
You will need sufficient permissions in Slack to add apps to your workspace. | ||
::: | ||
Navigate to **Deployment > Alerts** in the Dagster+ UI and click **Connect to Slack". From there, you can complete the installation process. | ||
|
||
When setting up an alert, you can choose a Slack channel to send those alerts to. Make sure to invite the `@Dagster+` bot to any channel that you'd like to receive an alert in. | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
## Alerting when a run fails | ||
You can set up alerts to notify you when a run fails. | ||
|
||
By default, these alerts will target all runs in the deployment, but they can be scoped to runs with a specific tag. | ||
<Tabs groupId="ui_or_code"> | ||
<TabItem value='ui' label='In the UI'> | ||
1. In the Dagster UI, click **Deployment**. | ||
2. Click the **Alerts** tab. | ||
3. Click **Add alert policy**. | ||
4. Select **Run alert** from the dropdown. | ||
|
||
5. Select **Job failure**. | ||
|
||
If desired, add **tags** in the format `{key}:{value}` to filter the runs that will be considered. | ||
|
||
</TabItem> | ||
<TabItem value='code' label='In code'> | ||
Execute the following command to sync the configured alert policy to your Dagster+ deployment. | ||
|
||
```bash | ||
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml | ||
``` | ||
<Tabs groupId="notification_service"> | ||
<TabItem value='email' label='Email'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-email.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='microsoft_teams' label='Microsoft Teams'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-microsoft_teams.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='pagerduty' label='PagerDuty'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-pagerduty.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='slack' label='Slack'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-slack.yaml" language="yaml" /> | ||
</TabItem> | ||
</Tabs> | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
## Alerting when a run is taking too long to complete | ||
You can set up alerts to notify you whenever a run takes more than some threshold amount of time. | ||
|
||
By default, these alerts will target all runs in the deployment, but they can be scoped to runs with a specific tag. | ||
<Tabs groupId="ui_or_code"> | ||
<TabItem value='ui' label='In the UI'> | ||
1. In the Dagster UI, click **Deployment**. | ||
2. Click the **Alerts** tab. | ||
3. Click **Add alert policy**. | ||
4. Select **Run alert** from the dropdown. | ||
|
||
5. Select **Job running over** and a number of hours to alert after. | ||
Check warning on line 110 in docs/docs-beta/docs/guides/monitor-alert/alerting.md GitHub Actions / runner / vale
|
||
|
||
If desired, add **tags** in the format `{key}:{value}` to filter the runs that will be considered. | ||
|
||
</TabItem> | ||
<TabItem value='code' label='In code'> | ||
Execute the following command to sync the configured alert policy to your Dagster+ deployment. | ||
|
||
```bash | ||
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml | ||
``` | ||
<Tabs groupId="notification_service"> | ||
<TabItem value='email' label='Email'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/job-running-over-one-hour-email.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='microsoft_teams' label='Microsoft Teams'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/job-running-over-one-hour-microsoft_teams.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='pagerduty' label='PagerDuty'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/job-running-over-one-hour-pagerduty.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='slack' label='Slack'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/job-running-over-one-hour-slack.yaml" language="yaml" /> | ||
</TabItem> | ||
</Tabs> | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
## Alerting when an asset fails to materialize | ||
You can set up alerts to notify you when an asset materialization attempt fails. | ||
|
||
By default, these alerts will target all assets in the deployment, but they can be scoped to a specific asset or group of assets. | ||
<Tabs groupId="ui_or_code"> | ||
<TabItem value='ui' label='In the UI'> | ||
1. In the Dagster UI, click **Deployment**. | ||
2. Click the **Alerts** tab. | ||
3. Click **Add alert policy**. | ||
4. Select **Asset alert** from the dropdown. | ||
|
||
5. Select **Failure** under the **Materializations** heading. | ||
|
||
If desired, select a **target** from the dropdown menu to scope this alert to a specific asset or group. | ||
|
||
</TabItem> | ||
<TabItem value='code' label='In code'> | ||
Execute the following command to sync the configured alert policy to your Dagster+ deployment. | ||
|
||
```bash | ||
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml | ||
``` | ||
<Tabs groupId="notification_service"> | ||
<TabItem value='email' label='Email'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-email.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='microsoft_teams' label='Microsoft Teams'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-microsoft_teams.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='pagerduty' label='PagerDuty'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-pagerduty.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='slack' label='Slack'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-slack.yaml" language="yaml" /> | ||
</TabItem> | ||
</Tabs> | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
## Alerting when an asset check fails | ||
You can set up alerts to notify you when an asset check on an asset fails. | ||
|
||
By default, these alerts will target all assets in the deployment, but they can be scoped to checks on a specific asset or group of assets. | ||
<Tabs groupId="ui_or_code"> | ||
<TabItem value='ui' label='In the UI'> | ||
1. In the Dagster UI, click **Deployment**. | ||
2. Click the **Alerts** tab. | ||
3. Click **Add alert policy**. | ||
4. Select **Asset alert** from the dropdown. | ||
|
||
5. Select **Failed (ERROR)** under the **Asset Checks** heading. | ||
|
||
If desired, select a **target** from the dropdown menu to scope this alert to a specific asset or group. | ||
|
||
</TabItem> | ||
<TabItem value='code' label='In code'> | ||
Execute the following command to sync the configured alert policy to your Dagster+ deployment. | ||
|
||
```bash | ||
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml | ||
``` | ||
<Tabs groupId="notification_service"> | ||
<TabItem value='email' label='Email'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/asset-check-failed-email.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='microsoft_teams' label='Microsoft Teams'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/asset-check-failed-microsoft_teams.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='pagerduty' label='PagerDuty'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/asset-check-failed-pagerduty.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='slack' label='Slack'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/asset-check-failed-slack.yaml" language="yaml" /> | ||
</TabItem> | ||
</Tabs> | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
## Alerting when a schedule or sensor tick fails | ||
You can set up alerts to fire when any schedule or sensor tick across your entire deployment fails. | ||
|
||
Alerts are sent only when a schedule/sensor transitions from **success** to **failure**, so only the initial failure will trigger the alert. | ||
<Tabs groupId="ui_or_code"> | ||
<TabItem value='ui' label='In the UI'> | ||
1. In the Dagster UI, click **Deployment**. | ||
2. Click the **Alerts** tab. | ||
3. Click **Add alert policy**. | ||
4. Select **Schedule/Sensor alert** from the dropdown. | ||
</TabItem> | ||
<TabItem value='code' label='In code'> | ||
Execute the following command to sync the configured alert policy to your Dagster+ deployment. | ||
|
||
```bash | ||
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml | ||
``` | ||
<Tabs groupId="notification_service"> | ||
<TabItem value='email' label='Email'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-email.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='microsoft_teams' label='Microsoft Teams'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-microsoft_teams.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='pagerduty' label='PagerDuty'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-pagerduty.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='slack' label='Slack'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/schedule-sensor-failure-slack.yaml" language="yaml" /> | ||
</TabItem> | ||
</Tabs> | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
## Alerting when a code location fails to load | ||
You can set up alerts to fire when any code location fails to load due to an error. | ||
<Tabs groupId="ui_or_code"> | ||
<TabItem value='ui' label='In the UI'> | ||
1. In the Dagster UI, click **Deployment**. | ||
2. Click the **Alerts** tab. | ||
3. Click **Add alert policy**. | ||
4. Select **Code location error alert** from the dropdown. | ||
</TabItem> | ||
<TabItem value='code' label='In code'> | ||
Execute the following command to sync the configured alert policy to your Dagster+ deployment. | ||
|
||
```bash | ||
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml | ||
``` | ||
<Tabs groupId="notification_service"> | ||
<TabItem value='email' label='Email'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/code-location-error-email.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='microsoft_teams' label='Microsoft Teams'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/code-location-error-microsoft_teams.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='pagerduty' label='PagerDuty'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/code-location-error-pagerduty.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='slack' label='Slack'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/code-location-error-slack.yaml" language="yaml" /> | ||
</TabItem> | ||
</Tabs> | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
## Alerting when a Hybrid agent becomes unavailable | ||
:::note | ||
This is only available for [Hybrid](/todo) deployments. | ||
::: | ||
|
||
You can set up alerts to fire if your Hybrid agent hasn't sent a heartbeat in the last 5 minutes. | ||
<Tabs groupId="ui_or_code"> | ||
<TabItem value='ui' label='In the UI'> | ||
1. In the Dagster UI, click **Deployment**. | ||
2. Click the **Alerts** tab. | ||
3. Click **Add alert policy**. | ||
4. Select **Code location error alert** from the dropdown. | ||
</TabItem> | ||
<TabItem value='code' label='In code'> | ||
Execute the following command to sync the configured alert policy to your Dagster+ deployment. | ||
|
||
```bash | ||
dagster-cloud deployment alert-policies sync -a /path/to/alert_policies.yaml | ||
``` | ||
<Tabs groupId="notification_service"> | ||
<TabItem value='email' label='Email'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/code-location-error-email.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='microsoft_teams' label='Microsoft Teams'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/code-location-error-microsoft_teams.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='pagerduty' label='PagerDuty'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/code-location-error-pagerduty.yaml" language="yaml" /> | ||
</TabItem> | ||
<TabItem value='slack' label='Slack'> | ||
<CodeExample filePath="guides/monitor-alert/alerting/code-location-error-slack.yaml" language="yaml" /> | ||
</TabItem> | ||
</Tabs> | ||
|
||
</TabItem> | ||
</Tabs> |
20 changes: 20 additions & 0 deletions
20
...a_snippets/docs_beta_snippets/guides/monitor-alert/alerting/asset-check-failed-email.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# alert_policies.yaml | ||
|
||
alert_policies: | ||
alert_targets: | ||
- asset_key_target: | ||
asset_key: | ||
- s3 | ||
- report | ||
- asset_group_target: | ||
asset_group: transformed | ||
location_name: prod | ||
repo_name: __repository__ | ||
description: Sends an email when an asset check fails. | ||
event_types: | ||
- ASSET_CHECK_SEVERITY_ERROR | ||
name: asset-check-failed-email | ||
notification_service: | ||
email_addresses: | ||
- [email protected] | ||
- [email protected] |
18 changes: 18 additions & 0 deletions
18
.../docs_beta_snippets/guides/monitor-alert/alerting/asset-check-failed-microsoft_teams.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# alert_policies.yaml | ||
|
||
alert_policies: | ||
alert_targets: | ||
- asset_key_target: | ||
asset_key: | ||
- s3 | ||
- report | ||
- asset_group_target: | ||
asset_group: transformed | ||
location_name: prod | ||
repo_name: __repository__ | ||
description: Sends a Microsoft Teams webhook when an asset check fails. | ||
event_types: | ||
- ASSET_CHECK_SEVERITY_ERROR | ||
name: asset-check-failed-microsoft_teams | ||
notification_service: | ||
webhook_url: https://yourdomain.webhook.office.com/... |
18 changes: 18 additions & 0 deletions
18
...ippets/docs_beta_snippets/guides/monitor-alert/alerting/asset-check-failed-pagerduty.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# alert_policies.yaml | ||
|
||
alert_policies: | ||
alert_targets: | ||
- asset_key_target: | ||
asset_key: | ||
- s3 | ||
- report | ||
- asset_group_target: | ||
asset_group: transformed | ||
location_name: prod | ||
repo_name: __repository__ | ||
description: Sends a PagerDuty alert when an asset check fails. | ||
event_types: | ||
- ASSET_CHECK_SEVERITY_ERROR | ||
name: asset-check-failed-pagerduty | ||
notification_service: | ||
integration_key: <pagerduty_integration_key> |
Oops, something went wrong.