Merge branch 'master' into nikki/docs/guides-structure

dagster-io · Dec 20, 2024 · d45721c · d45721c
2 parents 81e054a + a14cb87
commit d45721c
Show file tree

Hide file tree

Showing 140 changed files with 5,031 additions and 1,016 deletions.
diff --git a/.buildkite/dagster-buildkite/dagster_buildkite/steps/trigger.py b/.buildkite/dagster-buildkite/dagster_buildkite/steps/trigger.py
@@ -28,7 +28,7 @@ def build_trigger_step(
     dagster_commit_hash = safe_getenv("BUILDKITE_COMMIT")
     step: TriggerStep = {
         "trigger": pipeline,
-        "label": f":link: {pipeline} from dagster@{dagster_commit_hash[:6]}",
+        "label": f":link: {pipeline} from dagster@{dagster_commit_hash[:10]}",
         "async": async_step,
         "build": {
             "env": env or {},

diff --git a/CHANGES.md b/CHANGES.md
@@ -1,5 +1,38 @@
 # Changelog
 
+## 1.9.6 (core) / 0.25.6 (libraries)
+
+### New
+
+- Updated `cronitor` pin to allow versions `>= 5.0.1` to enable use of `DayOfWeek` as 7. Cronitor `4.0.0` is still disallowed. (Thanks, [@joshuataylor](https://github.com/joshuataylor)!)
+- Added flag `checkDbReadyInitContainer` to optionally disable db check initContainer.
+- [ui] Added Google Drive icon for `kind` tags. (Thanks, [@dragos-pop](https://github.com/dragos-pop)!)
+- [ui] Renamed the run lineage sidebar on the Run details page to `Re-executions`.
+- [ui] Sensors and schedules that appear in the Runs page are now clickable.
+- [ui] Runs targeting assets now show more of the assets in the Runs page.
+- [dagster-airbyte] The destination type for an Airbyte asset is now added as a `kind` tag for display in the UI.
+- [dagster-gcp] `DataprocResource` now receives an optional parameter `labels` to be attached to Dataproc clusters. (Thanks, [@thiagoazcampos](https://github.com/thiagoazcampos)!)
+- [dagster-k8s] Added a `checkDbReadyInitContainer` flag to the Dagster Helm chart to allow disabling the default init container behavior. (Thanks, [@easontm](https://github.com/easontm)!)
+- [dagster-k8s] K8s pod logs are now logged when a pod fails. (Thanks, [@apetryla](https://github.com/apetryla)!)
+- [dagster-sigma] Introduced `build_materialize_workbook_assets_definition` which can be used to build assets that run materialize schedules for a Sigma workbook.
+- [dagster-snowflake] `SnowflakeResource` and `SnowflakeIOManager` both accept `additional_snowflake_connection_args` config. This dictionary of arguments will be passed to the `snowflake.connector.connect` method. This config will be ignored if you are using the `sqlalchemy` connector.
+- [helm] Added the ability to set user-deployments labels on k8s deployments as well as pods.
+
+### Bugfixes
+
+- Assets with self dependencies and `BackfillPolicy` are now evaluated correctly during backfills. Self dependent assets no longer result in serial partition submissions or disregarded upstream dependencies.
+- Previously, the freshness check sensor would not re-evaluate freshness checks if an in-flight run was planning on evaluating that check. Now, the freshness check sensor will kick off an independent run of the check, even if there's already an in flight run, as long as the freshness check can potentially fail.
+- Previously, if the freshness check was in a failing state, the sensor would wait for a run to update the freshness check before re-evaluating. Now, if there's a materialization later than the last evaluation of the freshness check and no planned evaluation, we will re-evaluate the freshness check automatically.
+- [ui] Fixed run log streaming for runs with a large volume of logs.
+- [ui] Fixed a bug in the Backfill Preview where a loading spinner would spin forever if an asset had no valid partitions targeted by the backfill.
+- [dagster-aws] `PipesCloudWatchMessageReader` correctly identifies streams which are not ready yet and doesn't fail on `ThrottlingException`. (Thanks, [@jenkoian](https://github.com/jenkoian)!)
+- [dagster-fivetran] Column metadata can now be fetched for Fivetran assets using `FivetranWorkspace.sync_and_poll(...).fetch_column_metadata()`.
+- [dagster-k8s] The k8s client now waits for the main container to be ready instead of only waiting for sidecar init containers. (Thanks, [@OrenLederman](https://github.com/OrenLederman)!)
+
+### Documentation
+
+- Fixed a typo in the `dlt_assets` API docs. (Thanks, [@zilto](https://github.com/zilto)!)
+
 ## 1.9.5 (core) / 0.25.5 (libraries)
 
 ### New

diff --git a/docs/content/guides/limiting-concurrency-in-data-pipelines.mdx b/docs/content/guides/limiting-concurrency-in-data-pipelines.mdx
@@ -391,11 +391,6 @@ height={1638}
 
 ### Limiting op/asset concurrency across runs
 
-<Note>
-  This feature is experimental and is only supported with Postgres/MySQL
-  storages.
-</Note>
-
 #### For specific ops/assets
 
 Limits can be specified on the Dagster instance using the special op tag `dagster/concurrency_key`. If this instance limit would be exceeded by launching an op/asset, then the op/asset will be queued.

diff --git a/docs/docs-beta/CONTRIBUTING.md b/docs/docs-beta/CONTRIBUTING.md
@@ -102,6 +102,10 @@ After:
 | `DAGSTER_CLOUD_DEPLOYMENT_NAME` | The name of the Dagster+ deployment. <br/><br/>  **Example:** `prod`. |
 | `DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT` | `1` if the deployment is a [branch deployment](/dagster-plus/features/ci-cd/branch-deployments/index.md). |
 
+#### Line breaks and lists in tables
+
+[Use HTML](https://www.markdownguide.org/hacks/#table-formatting) to add line breaks and lists to tables.
+
 ### Whitespace via `{" "}`
 
 Forcing empty space using the `{" "}` interpolation is not supported, and must be removed.

diff --git a/docs/docs-beta/docs/dagster-plus/deployment/code-locations/dagster-cloud-yaml.md b/docs/docs-beta/docs/dagster-plus/deployment/code-locations/dagster-cloud-yaml.md
@@ -1,7 +1,305 @@
 ---
 title: dagster_cloud.yaml reference
 sidebar_position: 200
-unlisted: true
 ---
 
-{/* TODO move content from https://docs.dagster.io/dagster-plus/managing-deployments/dagster-cloud-yaml */}
+:::note
+This reference is applicable to Dagster+.
+:::
+
+<table
+  className="table"
+  style={{
+    width: "100%",
+  }}
+>
+  <tbody>
+    <tr>
+      <td
+        style={{
+          width: "15%",
+        }}
+      >
+        <strong>Name</strong>
+      </td>
+      <td>dagster_cloud.yaml</td>
+    </tr>
+    <tr>
+      <td
+        style={{
+          width: "15%",
+        }}
+      >
+        <strong>Status</strong>
+      </td>
+      <td>Active</td>
+    </tr>
+    <tr>
+      <td
+        style={{
+          width: "15%",
+        }}
+      >
+        <strong>Required</strong>
+      </td>
+      <td>Required for Dagster+</td>
+    </tr>
+    <tr>
+      <td
+        style={{
+          width: "15%",
+        }}
+      >
+        <strong>Description</strong>
+      </td>
+      <td>
+        {" "}
+        Similar to the <code>workspace.yaml</code> in open source to define code
+        locations for Dagster+.
+      </td>
+    </tr>
+    <tr>
+      <td
+        style={{
+          width: "15%",
+        }}
+      >
+        <strong>Uses</strong>
+      </td>
+      <td>
+        Defines multiple code locations for Dagster+. For Hybrid deployments, this file can be used
+        <a href="/dagster-plus/managing-deployments/setting-environment-variables-agents"> to manage
+        environment variables/secrets.</a>
+        <ul></ul>
+      </td>
+    </tr>
+  </tbody>
+</table>
+
+## File location
+
+The `dagster_cloud.yaml` file should be placed in the root of your Dagster project. Below is an example of a file structure modified from the [Dagster+ ETL quickstart](https://github.com/dagster-io/dagster/tree/master/examples/quickstart_etl).
+
+```shell
+quickstart_etl
+├── README.md
+├── quickstart_etl
+│   ├── __init__.py
+│   ├── assets
+│   ├── docker_image
+├── ml_project
+│   ├── quickstart_ml
+│     ├── __init__.py
+│     ├── ml_assets
+├── random_assets.py
+├── quickstart_etl_tests
+├── dagster_cloud.yaml
+├── pyproject.toml
+├── setup.cfg
+└── setup.py
+```
+
+If your repository contains multiple Dagster projects in subdirectories - otherwise known as a monorepository - add the `dagster_cloud.yaml` file to the root of where the Dagster projects are stored.
+
+## File structure
+
+Settings are formatted using YAML. For example, using the file structure above as an example:
+
+```yaml
+# dagster_cloud.yaml
+
+locations:
+  - location_name: data-eng-pipeline
+    code_source:
+      package_name: quickstart_etl
+    build:
+      directory: ./quickstart_etl
+      registry: localhost:5000/docker_image
+  - location_name: ml-pipeline
+    code_source:
+      package_name: quickstart_ml
+    working_directory: ./ml_project
+    executable_path: venvs/path/to/ml_tensorflow/bin/python
+  - location_name: my_random_assets
+    code_source:
+      python_file: random_assets.py
+    container_context:
+      k8s:
+        env_vars:
+          - database_name
+          - database_username=hooli_testing
+        env_secrets:
+          - database_password
+```
+
+## Settings
+
+The `dagster_cloud.yaml` file contains a single top-level key, `locations`. This key accepts a list of code locations; for each code location, you can configure the following:
+
+- [Location name](#location-name)
+- [Code source](#code-source)
+- [Working directory](#working-directory)
+- [Build](#build)
+- [Python executable](#python-executable)
+- [Container context](#container-context)
+
+### Location name
+
+**This key is required.** The `location_name` key specifies the name of the code location. The location name will always be paired with a [code source](#code-source).
+
+```yaml
+# dagster_cloud.yaml
+
+locations:
+  - location_name: data-eng-pipeline
+    code_source:
+      package_name: quickstart_etl
+```
+
+| Property        | Description                                                                            | Format   |
+|-----------------|----------------------------------------------------------------------------------------|----------|
+| `location_name` | The name of your code location that will appear in the Dagster UI Code locations page. | `string` |
+
+### Code source
+
+**This section is required.** The `code_source` defines how a code location is sourced.
+
+A `code_source` key must contain either a `module_name`, `package_name`, or `file_name` parameter that specifies where to find the definitions in the code location.
+
+<Tabs>
+<TabItem value="Single code location">
+
+```yaml
+# dagster_cloud.yaml
+
+locations:
+  - location_name: data-eng-pipeline
+    code_source:
+      package_name: quickstart_etl
+```
+
+</TabItem>
+<TabItem value="Multiple code locations">
+
+```yaml
+# dagster_cloud.yaml
+
+locations:
+  - location_name: data-eng-pipeline
+    code_source:
+      package_name: quickstart_etl
+  - location_name: machine_learning
+    code_source:
+      python_file: ml/ml_model.py
+```
+
+</TabItem>
+</Tabs>
+
+| Property                   | Description                                                                       | Format                   |
+|----------------------------|-----------------------------------------------------------------------------------|--------------------------|
+| `code_source.package_name` | The name of a package containing Dagster code                                     | `string` (folder name)   |
+| `code_source.python_file`  | The name of a Python file containing Dagster code (e.g. `analytics_pipeline.py` ) | `string` (.py file name) |
+| `code_source.module_name`  | The name of a Python module containing Dagster code (e.g. `analytics_etl`)        | `string` (module name)   |
+
+### Working directory
+
+Use the `working_directory` setting to load Dagster code from a different directory than the root of your code repository. This setting allows you to specify the directory you want to load your code from.
+
+Consider the following project:
+
+```shell
+quickstart_etl
+├── README.md
+├── project_directory
+│   ├── quickstart_etl
+│     ├── __init__.py
+│     ├── assets
+│   ├── quickstart_etl_tests
+├── dagster_cloud.yaml
+├── pyproject.toml
+├── setup.cfg
+└── setup.py
+```
+
+To load from `/project_directory`, the `dagster_cloud.yaml` code location would look like this:
+
+```yaml
+# dagster_cloud.yaml
+
+locations:
+  - location_name: data-eng-pipeline
+    code_source:
+      package_name: quickstart_etl
+    working_directory: ./project_directory
+```
+
+| Property            | Description                                                             | Format          |
+|---------------------|-------------------------------------------------------------------------|-----------------|
+| `working_directory` | The path of the directory that Dagster should load the code source from | `string` (path) |
+
+### Build
+
+The `build` section contains two parameters:
+
+- `directory` - Setting a build directory is useful if your `setup.py` or `requirements.txt` is in a subdirectory instead of the project root. This is common if you have multiple Python modules within a single Dagster project.
+- `registry` - **Applicable only to Hybrid deployments.** Specifies the Docker registry to push the code location to.
+
+In the example below, the Docker image for the code location is in the root directory and the registry and image defined:
+
+```yaml
+# dagster_cloud.yaml
+
+locations:
+  - location_name: data-eng-pipeline
+    code_source:
+      package_name: quickstart_etl
+    build:
+      directory: ./
+      registry: your-docker-image-registry/image-name # e.g. localhost:5000/myimage
+```
+
+
+| Property          | Description                                                                                                                                                           | Format                     | Default |
+|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------|---------|
+| `build.directory` | The path to the directory in your project that you want to deploy. If there are subdirectories, you can specify the path to only deploy a specific project directory. | `string` (path)            | `.`     |
+| `build.registry`  | **Applicable to Hybrid deployments.** The Docker registry to push your code location to                                                                               | `string` (docker registry) |         |
+
+
+### Python executable
+
+For Dagster+ Hybrid deployments, the Python executable that is installed globally in the image, or the default Python executable on the local system if you use the local agent, will be used. To use a different Python executable, specify it using the `executable_path` setting. It can be useful to have different Python executables for different code locations.
+
+{/* For Dagster+ Serverless deployments, you can specify a different Python version by [following these instructions](/dagster-plus/deployment/deployment-types/serverless/runtime-environment#python-version). */}
+For Dagster+ Serverless deployments, you can specify a different Python version by [following these instructions](/todo).
+
+```yaml
+# dagster_cloud.yaml
+
+locations:
+  - location_name: data-eng-pipeline
+    code_source:
+      package_name: quickstart_etl
+    executable_path: venvs/path/to/dataengineering_spark_team/bin/python
+  - location_name: machine_learning
+    code_source:
+      python_file: ml_model.py
+    executable_path: venvs/path/to/ml_tensorflow/bin/python
+```
+
+| Property          | Description                                   | Format          |
+|-------------------|-----------------------------------------------|-----------------|
+| `executable_path` | The file path of the Python executable to use | `string` (path) |
+
+### Container context
+
+If using Hybrid deployment, you can define additional configuration options for code locations using the `container_context` parameter. Depending on the Hybrid agent you're using, the configuration settings under `container_context` will vary.
+
+Refer to the configuration reference for your agent for more info:
+
+{/* - [Docker agent configuration reference](/dagster-plus/deployment/agents/docker/configuration-reference) */}
+- [Docker agent configuration reference](/todo)
+{/* - [Amazon ECS agent configuration reference](/dagster-plus/deployment/agents/amazon-ecs/configuration-reference) */}
+- [Amazon ECS agent configuration reference](/todo)
+{/* - [Kubernetes agent configuration reference](/dagster-plus/deployment/agents/kubernetes/configuration-reference) */}
+- [Kubernetes agent configuration reference](/todo)