Skip to content

Commit

Permalink
fix merge conflict
Browse files Browse the repository at this point in the history
Signed-off-by: nikki everett <[email protected]>
  • Loading branch information
neverett committed Dec 18, 2024
2 parents 1b4a76a + 910838e commit f17305e
Show file tree
Hide file tree
Showing 363 changed files with 10,661 additions and 2,343 deletions.
3 changes: 2 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
*.py diff=python
*.py diff=python
**/uv.lock linguist-generated
28 changes: 28 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,33 @@
# Changelog

## 1.9.5 (core) / 0.25.5 (libraries)

### New

- The automatic run retry daemon has been updated so that there is a single source of truth for if a run will be retried and if the retry has been launched. Tags are now added to run at failure time indicating if the run will be retried by the automatic retry system. Once the automatic retry has been launched, the run ID of the retry is added to the original run.
- When canceling a backfill of a job, the backfill daemon will now cancel all runs launched by that backfill before marking the backfill as canceled.
- Dagster execution info (tags such as `dagster/run-id`, `dagster/code-location`, `dagster/user` and Dagster Cloud environment variables) typically attached to external resources are now available under `DagsterRun.dagster_execution_info`.
- `SensorReturnTypesUnion` is now exported for typing the output of sensor functions.
- [dagster-dbt] dbt seeds now get a valid code version (Thanks [@marijncv](https://github.com/marijncv)!).
- Manual and automatic retries of runs launched by backfills that occur while the backfill is still in progress are now incorporated into the backfill's status.
- Manual retries of runs launched by backfills are no longer considered part of the backfill if the backfill is complete when the retry is launched.
- [dagster-fivetran] Fivetran assets can now be materialized using the FivetranWorkspace.sync_and_poll(…) method in the definition of a `@fivetran_assets` decorator.
- [dagster-fivetran] `load_fivetran_asset_specs` has been updated to accept an instance of `DagsterFivetranTranslator` or custom subclass.
- [dagster-fivetran] The `fivetran_assets` decorator was added. It can be used with the `FivetranWorkspace` resource and `DagsterFivetranTranslator` translator to load Fivetran tables for a given connector as assets in Dagster. The `build_fivetran_assets_definitions` factory can be used to create assets for all the connectors in your Fivetran workspace.
- [dagster-aws] `ECSPipesClient.run` now waits up to 70 days for tasks completion (waiter parameters are configurable) (Thanks [@jenkoian](https://github.com/jenkoian)!)
- [dagster-dbt] Update dagster-dbt scaffold template to be compatible with uv (Thanks [@wingyplus](https://github.com/wingyplus)!).
- [dagster-airbyte] A `load_airbyte_cloud_asset_specs` function has
been added. It can be used with the `AirbyteCloudWorkspace` resource and `DagsterAirbyteTranslator` translator to load your Airbyte Cloud connection streams as external assets in Dagster.
- [ui] Add an icon for the `icechunk` kind.
- [ui] Improved ui for manual sensor/schedule evaluation.

### Bugfixes

- Fixed database locking bug for the `ConsolidatedSqliteEventLogStorage`, which is mostly used for tests.
- [dagster-aws] Fixed a bug in the ECSRunLauncher that prevented it from accepting a user-provided task definition when DAGSTER_CURRENT_IMAGE was not set in the code location.
- [ui] Fixed an issue that would sometimes cause the asset graph to fail to render on initial load.
- [ui] Fix global auto-materialize tick timeline when paginating.

## 1.9.4 (core) / 0.25.4 (libraries)

### New
Expand Down
4 changes: 0 additions & 4 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -1341,10 +1341,6 @@
{
"title": "Migrating from Airflow",
"path": "/guides/migrations/migrating-airflow-to-dagster"
},
{
"title": "Observe your Airflow pipelines with Dagster",
"path": "/guides/migrations/observe-your-airflow-pipelines-with-dagster"
}
]
},
Expand Down
Binary file modified docs/content/api/modules.json.gz
Binary file not shown.
Binary file modified docs/content/api/searchindex.json.gz
Binary file not shown.
Binary file modified docs/content/api/sections.json.gz
Binary file not shown.
1 change: 1 addition & 0 deletions docs/content/concepts/metadata-tags/kind-tags.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ Some kinds are given a branded icon in the UI. We currently support nearly 200 u
| `go` | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-go-color.svg" width={20} height={20} /> |
| `google` | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-google-color.svg" width={20} height={20} /> |
| `googlecloud` | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-googlecloud-color.svg" width={20} height={20} /> |
| `googledrive` | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-googledrive-color.svg" width={20} height={20} /> |
| `googlesheets` | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-googlesheets-color.svg" width={20} height={20} /> |
| `graphql` | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-graphql-color.svg" width={20} height={20} /> |
| `greatexpectations` | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-greatexpectations-color.svg" width={20} height={20} /> |
Expand Down
46 changes: 43 additions & 3 deletions docs/content/dagster-plus/deployment/azure/acr-user-code.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -89,18 +89,26 @@ First, we'll need to generate a service principal for GitHub Actions to use to a
az ad sp create-for-rbac --name "github-actions-acr" --role contributor --scopes /subscriptions/<your_azure_subscription_id>/resourceGroups/<your_resource_group>/providers/Microsoft.ContainerRegistry/registries/<your_acr_name>
```

This command will output a JSON object with the service principal details. Make sure to save the `appId`, `password`, and `tenant` values - we'll use them in the next step.
This command will output a JSON object with the service principal details. Make sure to save the `appId` and `password` values - we'll use them in the next step.

### Add secrets to your repository

We'll add the service principal details as secrets in our repository. Go to your repository in GitHub, and navigate to `Settings` -> `Secrets`. Add the following secrets:

- `DAGSTER_CLOUD_API_TOKEN`: An agent token. For more details see [Managing agent tokens](/dagster-plus/account/managing-user-agent-tokens#managing-agent-tokens).
- `AZURE_CLIENT_ID`: The `appId` from the service principal JSON object.
- `AZURE_CLIENT_SECRET`: The `password` from the service principal JSON object.

### Update the workflow
### Update the GitHub Actions workflow

Finally, we'll update the workflow to use the service principal details. Open `.github/workflows/dagster-cloud-deploy.yml` in your repository, and uncomment the section on Azure Container Registry. It should look like this:
For this step, open `.github/workflows/dagster-cloud-deploy.yml` in your repository with your preferred text editor to perform the changes below.

In the `env` section of the workflow, update the following variables:

- `DAGSTER_CLOUD_ORGANIZATION`: The name of your Dagster Cloud organization.
- `IMAGE_REGISTRY`: The URL of your Azure Container Registry: `<your-acr-name>.azurecr.io`.

We'll update the workflow to use the Azure Container Registry by uncommenting its section and providing the principal details. It should look like this:

```yaml
# Azure Container Registry (ACR)
Expand All @@ -114,6 +122,34 @@ Finally, we'll update the workflow to use the service principal details. Open `.
password: ${{ secrets.AZURE_CLIENT_SECRET }}
```
Finally, update the tags in the "Build and upload Docker image" step to match the full URL of your image in ACR:
```yaml
- name: Build and upload Docker image for "quickstart_etl"
if: steps.prerun.outputs.result != 'skip'
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ${{ env.IMAGE_REGISTRY }}/<image-name>:${{ env.IMAGE_TAG }}
cache-from: type=gha
cache-to: type=gha,mode=max
```
### Update the `dagster_cloud.yaml` build configuration to use the Azure Container Registry

Edit the `dagster_cloud.yaml` file in the root of your repository. Update the `build` section to use the Azure Container Registry, and provide an image name specific to the code location. This must match the registry and image name used in the previous step.

```yaml
locations:
- location_name: quickstart_etl
code_source:
package_name: quickstart_etl.definitions
build:
directory: ./
registry: <your-acr-name>.azurecr.io/<image-name>
```

### Push and run the workflow

Now, commit and push the changes to your repository. The GitHub Actions workflow should run automatically. You can check the status of the workflow in the `Actions` tab of your repository.
Expand All @@ -133,3 +169,7 @@ alt="Dagster+ code locations page showing the new code location"
width={1152}
height={320}
/>

## Next steps

Now that you have your code location deployed, you can follow the guide [here](/dagster-plus/deployment/azure/blob-compute-logs) to set up logging in your AKS cluster.
83 changes: 52 additions & 31 deletions docs/content/dagster-plus/deployment/azure/blob-compute-logs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,66 +25,83 @@ First, we'll enable the cluster to use workload identity. This will allow the AK
az aks update --resource-group <resource-group> --name <cluster-name> --enable-workload-identity
```

Then, we'll create a new managed identity for the AKS agent, and a new service account in our AKS cluster.
Then, we'll create a new managed identity for the AKS agent.

```bash
az identity create --resource-group <resource-group> --name agent-identity
kubectl create serviceaccount dagster-agent-service-account --namespace dagster-agent
```

Now we need to federate the managed identity with the service account.
We will need to find the name of the service account used by the Dagster+ Agent. If you used the [Dagster+ Helm chart](/dagster-plus/deployment/agents/kubernetes/configuring-running-kubernetes-agent), it should be `user-cloud-dagster-cloud-agent`. You can confirm by using this command:

```bash
kubectl get serviceaccount -n <dagster-agent-namespace>
```

Now we need to federate the managed identity with the service account used by the Dagster+ Agent.

```bash
az identity federated-credential create \
--name dagster-agent-federated-id \
--identity-name agent-identity \
--resource-group <resource-group> \
--issuer $(az aks show -g <resource-group> -n <aks-cluster-name> --query "oidcIssuerProfile.issuerUrl" -otsv) \
--subject system:serviceaccount:dagster-agent:dagster-agent-service-account
--subject system:serviceaccount:<dagster-agent-namespace>:<dagster-agent-service-account>
```

Finally, we'll edit our AKS agent deployment to use the new service account.
You will need to obtain the client id of this identity for the next few operations. Make sure to save this value:

```bash
kubectl edit deployment <your-user-cloud-deployment> -n dagster-agent
az identity show -g <resource-group> -n agent-identity --query 'clientId' -otsv
```

In the deployment manifest, add the following lines:
We need to grant access to the storage account.

```bash
az role assignment create \
--assignee <managed-identity-client-id> \
--role "Storage Blob Data Contributor" \
--scope $(az storage account show -g <resource-group> -n <storage-account> --query 'id' -otsv)
```

You will need to add new annotations and labels in Kubernetes to enable the use of workload identities. If you're using the Dagster+ Helm Chart, modify your values.yaml to add the following lines:

```yaml
metadata:
...
serviceAccount:
annotations:
azure.workload.identity/client-id: "<managed-identity-client-id>"

dagsterCloudAgent:
labels:
azure.workload.identity/use: "true"

workspace:
labels:
...
azure.workload.identity/use: "true"
spec:
...
template:
...
spec:
...
serviceAccountName: dagster-agent-sa
```
If everything is set up correctly, you should be able to run the following command and see an access token returned:
<Note>
If you need to retrieve the values used by your Helm deployment, you can run:
`helm get values user-cloud > values.yaml`.
</Note>

Finally, update your Helm release with the new values:

```bash
kubectl exec -n dagster-agent -it <pod-in-cluster> -- bash
# in the pod
curl -H "Metadata:true" "http://169.254.169.254/metadata/identity/oauth2/token?resource=https://storage.azure.com/"
helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n <dagster-agent-namespace> -f values.yaml
```

## Step 2: Configure Dagster to use Azure Blob Storage

Now, you need to update the helm values to use Azure Blob Storage for logs. You can do this by editing the `values.yaml` file for your user-cloud deployment.

Pull down the current values for your deployment:
If everything is set up correctly, you should be able to run the following command and see an access token returned:

```bash
helm get values user-cloud > current-values.yaml
kubectl exec -n <dagster-agent-namespace> -it <pod-in-cluster> -- bash
# in the pod
apt update && apt install -y curl # install curl if missing, may vary depending on the base image
curl -H "Metadata:true" "http://169.254.169.254/metadata/identity/oauth2/token?resource=https://storage.azure.com/&api-version=2018-02-01"
```

Then, edit the `current-values.yaml` file to include the following lines:
## Step 2: Configure Dagster to use Azure Blob Storage

Once again, you need to update the Helm values to use Azure Blob Storage for logs. You can do this by editing the `values.yaml` file for your user-cloud deployment to include the following lines:

```yaml
computeLogs:
Expand All @@ -97,18 +114,22 @@ computeLogs:
container: mycontainer
default_azure_credential:
exclude_environment_credential: false
prefix: dagster-logs-
prefix: dagster-logs
local_dir: "/tmp/cool"
upload_interval: 30
```

Finally, update your deployment with the new values:

```bash
helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n dagster-agent -f current-values.yaml
helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n <dagster-agent-namespace> -f values.yaml
```

## Step 3: Verify logs are being written to Azure Blob Storage
## Step 3: Update your code location to enable the use of the AzureBlobComputeLogManager

- Add `dagster-azure` to your `setup.py` file. This will allow you to import the `AzureBlobComputeLogManager` class.

## Step 4: Verify logs are being written to Azure Blob Storage

It's time to kick off a run in Dagster to test your new configuration. If following along with the quickstart repo, you should be able to kick off a run of the `all_assets_job`, which will generate logs for you to test against. Otherwise, use any job that emits logs. When you go to the stdout/stderr window of the run page, you should see a log file that directs you to the Azure Blob Storage container.

Expand Down
2 changes: 1 addition & 1 deletion docs/content/deployment/run-monitoring.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ When Dagster terminates a run, the run moves into CANCELING status and sends a t

## General run timeouts

After a run is marked as STARTED, it may hang indefinitely for various reasons (user API errors, network issues, etc.). You can configure a maximum runtime for every run in a deployment by setting the `run_monitoring.max_runtime_seconds` field in your dagster.yaml or (Dagster+ deployment settings)\[dagster-plus/managing-deployments/deployment-settings-reference] to the maximum runtime in seconds. If a run exceeds this timeout and run monitoring is enabled, it will be marked as failed. The `dagster/max_runtime` tag can also be used to set a timeout in seconds on a per-run basis.
After a run is marked as STARTED, it may hang indefinitely for various reasons (user API errors, network issues, etc.). You can configure a maximum runtime for every run in a deployment by setting the `run_monitoring.max_runtime_seconds` field in your dagster.yaml or [Dagster+ deployment settings](/dagster-plus/managing-deployments/deployment-settings-reference) to the maximum runtime in seconds. If a run exceeds this timeout and run monitoring is enabled, it will be marked as failed. The `dagster/max_runtime` tag can also be used to set a timeout in seconds on a per-run basis.

For example, to configure a maximum of 2 hours for every run in your deployment:

Expand Down
1 change: 0 additions & 1 deletion docs/content/guides/migrations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,3 @@ Explore your options for migrating from other platforms to Dagster.
Curious how you can migrate your Airflow pipelines to Dagster?

- Learn how to perform [a lift-and-shift migration of Airflow to Dagster](/guides/migrations/migrating-airflow-to-dagster)
- Learn how to leverage the features of [Dagster and Airflow together using Dagster Pipes](/guides/migrations/observe-your-airflow-pipelines-with-dagster)
Loading

0 comments on commit f17305e

Please sign in to comment.