fix merge conflict

Signed-off-by: nikki everett <[email protected]>
dagster-io · Dec 18, 2024 · f17305e · f17305e
2 parents 1b4a76a + 910838e
commit f17305e
Show file tree

Hide file tree

Showing 363 changed files with 10,661 additions and 2,343 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -1 +1,2 @@
-*.py diff=python
+*.py diff=python
+**/uv.lock linguist-generated
diff --git a/CHANGES.md b/CHANGES.md
@@ -1,5 +1,33 @@
 # Changelog
 
+## 1.9.5 (core) / 0.25.5 (libraries)
+
+### New
+
+- The automatic run retry daemon has been updated so that there is a single source of truth for if a run will be retried and if the retry has been launched. Tags are now added to run at failure time indicating if the run will be retried by the automatic retry system. Once the automatic retry has been launched, the run ID of the retry is added to the original run.
+- When canceling a backfill of a job, the backfill daemon will now cancel all runs launched by that backfill before marking the backfill as canceled.
+- Dagster execution info (tags such as `dagster/run-id`, `dagster/code-location`, `dagster/user` and Dagster Cloud environment variables) typically attached to external resources are now available under `DagsterRun.dagster_execution_info`.
+- `SensorReturnTypesUnion` is now exported for typing the output of sensor functions.
+- [dagster-dbt] dbt seeds now get a valid code version (Thanks [@marijncv](https://github.com/marijncv)!).
+- Manual and automatic retries of runs launched by backfills that occur while the backfill is still in progress are now incorporated into the backfill's status.
+- Manual retries of runs launched by backfills are no longer considered part of the backfill if the backfill is complete when the retry is launched.
+- [dagster-fivetran] Fivetran assets can now be materialized using the FivetranWorkspace.sync_and_poll(…) method in the definition of a `@fivetran_assets` decorator.
+- [dagster-fivetran] `load_fivetran_asset_specs` has been updated to accept an instance of `DagsterFivetranTranslator` or custom subclass.
+- [dagster-fivetran] The `fivetran_assets` decorator was added. It can be used with the `FivetranWorkspace` resource and `DagsterFivetranTranslator` translator to load Fivetran tables for a given connector as assets in Dagster. The `build_fivetran_assets_definitions` factory can be used to create assets for all the connectors in your Fivetran workspace.
+- [dagster-aws] `ECSPipesClient.run` now waits up to 70 days for tasks completion (waiter parameters are configurable) (Thanks [@jenkoian](https://github.com/jenkoian)!)
+- [dagster-dbt] Update dagster-dbt scaffold template to be compatible with uv (Thanks [@wingyplus](https://github.com/wingyplus)!).
+- [dagster-airbyte] A `load_airbyte_cloud_asset_specs` function has
+  been added. It can be used with the `AirbyteCloudWorkspace` resource and `DagsterAirbyteTranslator` translator to load your Airbyte Cloud connection streams as external assets in Dagster.
+- [ui] Add an icon for the `icechunk` kind.
+- [ui] Improved ui for manual sensor/schedule evaluation.
+
+### Bugfixes
+
+- Fixed database locking bug for the `ConsolidatedSqliteEventLogStorage`, which is mostly used for tests.
+- [dagster-aws] Fixed a bug in the ECSRunLauncher that prevented it from accepting a user-provided task definition when DAGSTER_CURRENT_IMAGE was not set in the code location.
+- [ui] Fixed an issue that would sometimes cause the asset graph to fail to render on initial load.
+- [ui] Fix global auto-materialize tick timeline when paginating.
+
 ## 1.9.4 (core) / 0.25.4 (libraries)
 
 ### New

diff --git a/docs/content/_navigation.json b/docs/content/_navigation.json
@@ -1341,10 +1341,6 @@
           {
             "title": "Migrating from Airflow",
             "path": "/guides/migrations/migrating-airflow-to-dagster"
-          },
-          {
-            "title": "Observe your Airflow pipelines with Dagster",
-            "path": "/guides/migrations/observe-your-airflow-pipelines-with-dagster"
           }
         ]
       },

diff --git a/docs/content/api/modules.json.gz b/docs/content/api/modules.json.gz
diff --git a/docs/content/api/searchindex.json.gz b/docs/content/api/searchindex.json.gz
diff --git a/docs/content/api/sections.json.gz b/docs/content/api/sections.json.gz
diff --git a/docs/content/concepts/metadata-tags/kind-tags.mdx b/docs/content/concepts/metadata-tags/kind-tags.mdx
@@ -124,6 +124,7 @@ Some kinds are given a branded icon in the UI. We currently support nearly 200 u
 | `go`                | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-go-color.svg" width={20} height={20} />                |
 | `google`            | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-google-color.svg" width={20} height={20} />            |
 | `googlecloud`       | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-googlecloud-color.svg" width={20} height={20} />       |
+| `googledrive`       | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-googledrive-color.svg" width={20} height={20} />       |
 | `googlesheets`      | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-googlesheets-color.svg" width={20} height={20} />      |
 | `graphql`           | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-graphql-color.svg" width={20} height={20} />           |
 | `greatexpectations` | <Image src="/images/concepts/metadata-tags/kinds/icons/tool-greatexpectations-color.svg" width={20} height={20} /> |

diff --git a/docs/content/dagster-plus/deployment/azure/acr-user-code.mdx b/docs/content/dagster-plus/deployment/azure/acr-user-code.mdx
@@ -89,18 +89,26 @@ First, we'll need to generate a service principal for GitHub Actions to use to a
 az ad sp create-for-rbac --name "github-actions-acr" --role contributor --scopes /subscriptions/<your_azure_subscription_id>/resourceGroups/<your_resource_group>/providers/Microsoft.ContainerRegistry/registries/<your_acr_name>
 ```
 
-This command will output a JSON object with the service principal details. Make sure to save the `appId`, `password`, and `tenant` values - we'll use them in the next step.
+This command will output a JSON object with the service principal details. Make sure to save the `appId` and `password` values - we'll use them in the next step.
 
 ### Add secrets to your repository
 
 We'll add the service principal details as secrets in our repository. Go to your repository in GitHub, and navigate to `Settings` -> `Secrets`. Add the following secrets:
 
+- `DAGSTER_CLOUD_API_TOKEN`: An agent token. For more details see [Managing agent tokens](/dagster-plus/account/managing-user-agent-tokens#managing-agent-tokens).
 - `AZURE_CLIENT_ID`: The `appId` from the service principal JSON object.
 - `AZURE_CLIENT_SECRET`: The `password` from the service principal JSON object.
 
-### Update the workflow
+### Update the GitHub Actions workflow
 
-Finally, we'll update the workflow to use the service principal details. Open `.github/workflows/dagster-cloud-deploy.yml` in your repository, and uncomment the section on Azure Container Registry. It should look like this:
+For this step, open `.github/workflows/dagster-cloud-deploy.yml` in your repository with your preferred text editor to perform the changes below.
+
+In the `env` section of the workflow, update the following variables:
+
+- `DAGSTER_CLOUD_ORGANIZATION`: The name of your Dagster Cloud organization.
+- `IMAGE_REGISTRY`: The URL of your Azure Container Registry: `<your-acr-name>.azurecr.io`.
+
+We'll update the workflow to use the Azure Container Registry by uncommenting its section and providing the principal details. It should look like this:
 
 ```yaml
 # Azure Container Registry (ACR)
@@ -114,6 +122,34 @@ Finally, we'll update the workflow to use the service principal details. Open `.
     password: ${{ secrets.AZURE_CLIENT_SECRET }}
 ```
 
+Finally, update the tags in the "Build and upload Docker image" step to match the full URL of your image in ACR:
+
+```yaml
+- name: Build and upload Docker image for "quickstart_etl"
+  if: steps.prerun.outputs.result != 'skip'
+  uses: docker/build-push-action@v4
+  with:
+    context: .
+    push: true
+    tags: ${{ env.IMAGE_REGISTRY }}/<image-name>:${{ env.IMAGE_TAG }}
+    cache-from: type=gha
+    cache-to: type=gha,mode=max
+```
+
+### Update the `dagster_cloud.yaml` build configuration to use the Azure Container Registry
+
+Edit the `dagster_cloud.yaml` file in the root of your repository. Update the `build` section to use the Azure Container Registry, and provide an image name specific to the code location. This must match the registry and image name used in the previous step.
+
+```yaml
+locations:
+  - location_name: quickstart_etl
+    code_source:
+      package_name: quickstart_etl.definitions
+    build:
+      directory: ./
+      registry: <your-acr-name>.azurecr.io/<image-name>
+```
+
 ### Push and run the workflow
 
 Now, commit and push the changes to your repository. The GitHub Actions workflow should run automatically. You can check the status of the workflow in the `Actions` tab of your repository.
@@ -133,3 +169,7 @@ alt="Dagster+ code locations page showing the new code location"
 width={1152}
 height={320}
 />
+
+## Next steps
+
+Now that you have your code location deployed, you can follow the guide [here](/dagster-plus/deployment/azure/blob-compute-logs) to set up logging in your AKS cluster.
diff --git a/docs/content/dagster-plus/deployment/azure/blob-compute-logs.mdx b/docs/content/dagster-plus/deployment/azure/blob-compute-logs.mdx
@@ -25,66 +25,83 @@ First, we'll enable the cluster to use workload identity. This will allow the AK
 az aks update --resource-group <resource-group> --name <cluster-name> --enable-workload-identity
 ```
 
-Then, we'll create a new managed identity for the AKS agent, and a new service account in our AKS cluster.
+Then, we'll create a new managed identity for the AKS agent.
 
 ```bash
 az identity create --resource-group <resource-group> --name agent-identity
-kubectl create serviceaccount dagster-agent-service-account --namespace dagster-agent
 ```
 
-Now we need to federate the managed identity with the service account.
+We will need to find the name of the service account used by the Dagster+ Agent. If you used the [Dagster+ Helm chart](/dagster-plus/deployment/agents/kubernetes/configuring-running-kubernetes-agent), it should be `user-cloud-dagster-cloud-agent`. You can confirm by using this command:
+
+```bash
+kubectl get serviceaccount -n <dagster-agent-namespace>
+```
+
+Now we need to federate the managed identity with the service account used by the Dagster+ Agent.
 
 ```bash
 az identity federated-credential create \
   --name dagster-agent-federated-id \
   --identity-name agent-identity \
   --resource-group <resource-group> \
   --issuer $(az aks show -g <resource-group> -n <aks-cluster-name> --query "oidcIssuerProfile.issuerUrl" -otsv) \
-  --subject system:serviceaccount:dagster-agent:dagster-agent-service-account
+  --subject system:serviceaccount:<dagster-agent-namespace>:<dagster-agent-service-account>
 ```
 
-Finally, we'll edit our AKS agent deployment to use the new service account.
+You will need to obtain the client id of this identity for the next few operations. Make sure to save this value:
 
 ```bash
-kubectl edit deployment <your-user-cloud-deployment> -n dagster-agent
+az identity show -g <resource-group> -n agent-identity --query 'clientId' -otsv
 ```
 
-In the deployment manifest, add the following lines:
+We need to grant access to the storage account.
+
+```bash
+az role assignment create \
+  --assignee <managed-identity-client-id> \
+  --role "Storage Blob Data Contributor" \
+  --scope $(az storage account show -g <resource-group> -n <storage-account> --query 'id' -otsv)
+```
+
+You will need to add new annotations and labels in Kubernetes to enable the use of workload identities. If you're using the Dagster+ Helm Chart, modify your values.yaml to add the following lines:
 
 ```yaml
-metadata:
-  ...
+serviceAccount:
+  annotations:
+    azure.workload.identity/client-id: "<managed-identity-client-id>"
+
+dagsterCloudAgent:
+  labels:
+    azure.workload.identity/use: "true"
+
+workspace:
   labels:
-    ...
     azure.workload.identity/use: "true"
-spec:
-  ...
-  template:
-    ...
-    spec:
-      ...
-      serviceAccountName: dagster-agent-sa
 ```
 
-If everything is set up correctly, you should be able to run the following command and see an access token returned:
+<Note>
+  If you need to retrieve the values used by your Helm deployment, you can run:
+  `helm get values user-cloud > values.yaml`.
+</Note>
+
+Finally, update your Helm release with the new values:
 
 ```bash
-kubectl exec -n dagster-agent -it <pod-in-cluster> -- bash
-# in the pod
-curl -H "Metadata:true" "http://169.254.169.254/metadata/identity/oauth2/token?resource=https://storage.azure.com/"
+helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n <dagster-agent-namespace> -f values.yaml
 ```
 
-## Step 2: Configure Dagster to use Azure Blob Storage
-
-Now, you need to update the helm values to use Azure Blob Storage for logs. You can do this by editing the `values.yaml` file for your user-cloud deployment.
-
-Pull down the current values for your deployment:
+If everything is set up correctly, you should be able to run the following command and see an access token returned:
 
 ```bash
-helm get values user-cloud > current-values.yaml
+kubectl exec -n <dagster-agent-namespace> -it <pod-in-cluster> -- bash
+# in the pod
+apt update && apt install -y curl # install curl if missing, may vary depending on the base image
+curl -H "Metadata:true" "http://169.254.169.254/metadata/identity/oauth2/token?resource=https://storage.azure.com/&api-version=2018-02-01"
 ```
 
-Then, edit the `current-values.yaml` file to include the following lines:
+## Step 2: Configure Dagster to use Azure Blob Storage
+
+Once again, you need to update the Helm values to use Azure Blob Storage for logs. You can do this by editing the `values.yaml` file for your user-cloud deployment to include the following lines:
 
 ```yaml
 computeLogs:
@@ -97,18 +114,22 @@ computeLogs:
       container: mycontainer
       default_azure_credential:
         exclude_environment_credential: false
-      prefix: dagster-logs-
+      prefix: dagster-logs
       local_dir: "/tmp/cool"
       upload_interval: 30
 ```
 
 Finally, update your deployment with the new values:
 
 ```bash
-helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n dagster-agent -f current-values.yaml
+helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n <dagster-agent-namespace> -f values.yaml
 ```
 
-## Step 3: Verify logs are being written to Azure Blob Storage
+## Step 3: Update your code location to enable the use of the AzureBlobComputeLogManager
+
+- Add `dagster-azure` to your `setup.py` file. This will allow you to import the `AzureBlobComputeLogManager` class.
+
+## Step 4: Verify logs are being written to Azure Blob Storage
 
 It's time to kick off a run in Dagster to test your new configuration. If following along with the quickstart repo, you should be able to kick off a run of the `all_assets_job`, which will generate logs for you to test against. Otherwise, use any job that emits logs. When you go to the stdout/stderr window of the run page, you should see a log file that directs you to the Azure Blob Storage container.
 

diff --git a/docs/content/deployment/run-monitoring.mdx b/docs/content/deployment/run-monitoring.mdx
@@ -39,7 +39,7 @@ When Dagster terminates a run, the run moves into CANCELING status and sends a t
 
 ## General run timeouts
 
-After a run is marked as STARTED, it may hang indefinitely for various reasons (user API errors, network issues, etc.). You can configure a maximum runtime for every run in a deployment by setting the `run_monitoring.max_runtime_seconds` field in your dagster.yaml or (Dagster+ deployment settings)\[dagster-plus/managing-deployments/deployment-settings-reference] to the maximum runtime in seconds. If a run exceeds this timeout and run monitoring is enabled, it will be marked as failed. The `dagster/max_runtime` tag can also be used to set a timeout in seconds on a per-run basis.
+After a run is marked as STARTED, it may hang indefinitely for various reasons (user API errors, network issues, etc.). You can configure a maximum runtime for every run in a deployment by setting the `run_monitoring.max_runtime_seconds` field in your dagster.yaml or [Dagster+ deployment settings](/dagster-plus/managing-deployments/deployment-settings-reference) to the maximum runtime in seconds. If a run exceeds this timeout and run monitoring is enabled, it will be marked as failed. The `dagster/max_runtime` tag can also be used to set a timeout in seconds on a per-run basis.
 
 For example, to configure a maximum of 2 hours for every run in your deployment:
 

diff --git a/docs/content/guides/migrations.mdx b/docs/content/guides/migrations.mdx
@@ -13,4 +13,3 @@ Explore your options for migrating from other platforms to Dagster.
 Curious how you can migrate your Airflow pipelines to Dagster?
 
 - Learn how to perform [a lift-and-shift migration of Airflow to Dagster](/guides/migrations/migrating-airflow-to-dagster)
-- Learn how to leverage the features of [Dagster and Airflow together using Dagster Pipes](/guides/migrations/observe-your-airflow-pipelines-with-dagster)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -13,4 +13,3 @@ Explore your options for migrating from other platforms to Dagster.
		Curious how you can migrate your Airflow pipelines to Dagster?

		- Learn how to perform [a lift-and-shift migration of Airflow to Dagster](/guides/migrations/migrating-airflow-to-dagster)
		- Learn how to leverage the features of [Dagster and Airflow together using Dagster Pipes](/guides/migrations/observe-your-airflow-pipelines-with-dagster)