Merge branch 'master' into nikki/docs/hybrid-docs

dagster-io · Dec 17, 2024 · 4967813 · 4967813
2 parents 9e5f445 + 809d837
commit 4967813
Show file tree

Hide file tree

Showing 78 changed files with 2,428 additions and 514 deletions.
diff --git a/docs/content/deployment/run-monitoring.mdx b/docs/content/deployment/run-monitoring.mdx
@@ -39,7 +39,7 @@ When Dagster terminates a run, the run moves into CANCELING status and sends a t
 
 ## General run timeouts
 
-After a run is marked as STARTED, it may hang indefinitely for various reasons (user API errors, network issues, etc.). You can configure a maximum runtime for every run in a deployment by setting the `run_monitoring.max_runtime_seconds` field in your dagster.yaml or (Dagster+ deployment settings)\[dagster-plus/managing-deployments/deployment-settings-reference] to the maximum runtime in seconds. If a run exceeds this timeout and run monitoring is enabled, it will be marked as failed. The `dagster/max_runtime` tag can also be used to set a timeout in seconds on a per-run basis.
+After a run is marked as STARTED, it may hang indefinitely for various reasons (user API errors, network issues, etc.). You can configure a maximum runtime for every run in a deployment by setting the `run_monitoring.max_runtime_seconds` field in your dagster.yaml or [Dagster+ deployment settings](/dagster-plus/managing-deployments/deployment-settings-reference) to the maximum runtime in seconds. If a run exceeds this timeout and run monitoring is enabled, it will be marked as failed. The `dagster/max_runtime` tag can also be used to set a timeout in seconds on a per-run basis.
 
 For example, to configure a maximum of 2 hours for every run in your deployment:
 

diff --git a/docs/docs-beta/CONTRIBUTING.md b/docs/docs-beta/CONTRIBUTING.md
@@ -80,20 +80,27 @@ Before:
 
 ```
 <ReferenceTable>
-  <ReferenceTableItem propertyName="container_context.ecs.env_vars">
-    A list of keys or key-value pairs to include in the task. If a value is not
-    specified, the value will be pulled from the agent task.
-    <br />
-    In the example above, <code>FOO_ENV_VAR</code> will be set to{" "}
-    <code>foo_value</code> and <code>BAR_ENV_VAR</code> will be set to whatever
-    value it has in the agent task.
+  <ReferenceTableItem propertyName="DAGSTER_CLOUD_DEPLOYMENT_NAME">
+    The name of the Dagster+ deployment. For example, <code>prod</code>.
+  </ReferenceTableItem>
+  <ReferenceTableItem propertyName="DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT">
+    If <code>1</code>, the deployment is a{" "}
+    <a href="/dagster-plus/managing-deployments/branch-deployments">
+      branch deployment
+    </a>
+    . Refer to the <a href="#reserved-branch-deployment-variables">
+      Branch Deployment variables section
+    </a> for a list of variables available in branch deployments.
   </ReferenceTableItem>
 </ReferenceTable>
 ```
 
 After:
 
-_There is not a replacement at this point in time..._
+| Key | Value |
+|---|---|
+| `DAGSTER_CLOUD_DEPLOYMENT_NAME` | The name of the Dagster+ deployment. <br/><br/>  **Example:** `prod`. |
+| `DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT` | `1` if the deployment is a [branch deployment](/dagster-plus/features/ci-cd/branch-deployments/index.md). |
 
 ### Whitespace via `{" "}`
 

diff --git a/...docs/dagster-plus/deployment/deployment-types/serverless/ci-cd-in-serverless.md b/...docs/dagster-plus/deployment/deployment-types/serverless/ci-cd-in-serverless.md
@@ -83,5 +83,3 @@ dagster-cloud serverless deploy-python-executable ./my-dagster-project \
 
 </TabItem>
 </Tabs>
-
----
diff --git a/docs/docs-beta/docs/dagster-plus/deployment/deployment-types/serverless/index.md b/docs/docs-beta/docs/dagster-plus/deployment/deployment-types/serverless/index.md
@@ -8,9 +8,7 @@ sidebar_position: 10
 
 Dagster+ Serverless is a fully managed version of Dagster+ and is the easiest way to get started with Dagster. With a Serverless deployment, you can run your Dagster jobs without spinning up any infrastructure yourself.
 
----
-
-## When to choose Serverless \{#when-to-choose-serverless}
+## Serverless vs Hybrid
 
 Serverless works best with workloads that primarily orchestrate other services or perform light computation. Most workloads fit into this category, especially those that orchestrate third-party SaaS products like cloud data warehouses and ETL tools.
 
@@ -21,9 +19,7 @@ If any of the following are applicable, you should select [Hybrid deployment](/d
 - You need to distribute computation across many nodes for a single run. Dagster+ runs currently execute on a single node with 4 CPUs
 - You don't want to add Dagster Labs as a data processor
 
----
-
-## Limitations \{#limitations}
+## Limitations
 
 Serverless is subject to the following limitations:
 
@@ -36,8 +32,6 @@ Serverless is subject to the following limitations:
 
 Dagster+ Pro customers may request a quota increase by [contacting Sales](https://dagster.io/contact).
 
----
-
 ## Next steps
 
-To start using Dagster+ Serverless, follow our [Getting started with Dagster+](/dagster-plus/getting-started) guide.
+To start using Dagster+ Serverless, follow the steps in [Getting started with Dagster+](/dagster-plus/getting-started).
diff --git a/...-beta/docs/dagster-plus/deployment/deployment-types/serverless/run-isolation.md b/...-beta/docs/dagster-plus/deployment/deployment-types/serverless/run-isolation.md
@@ -15,8 +15,6 @@ To follow the steps in this guide, you'll need:
 - An understanding of [Dagster+ deployment settings](/dagster-plus/deployment/management/settings/deployment-settings)
 </details>
 
----
-
 ## Differences between isolated and non-isolated runs
 
 - [**Isolated runs**](#isolated-runs-default) execute in their own container. They're the default and are intended for production and compute-heavy use cases.

diff --git a/...docs/dagster-plus/deployment/deployment-types/serverless/runtime-environment.md b/...docs/dagster-plus/deployment/deployment-types/serverless/runtime-environment.md
@@ -7,13 +7,13 @@ sidebar_position: 100
 By default, Dagster+ Serverless will package your code as PEX files and deploys them on Docker images. Using PEX files significantly reduces the time to deploy since it does not require building a new Docker image and provisioning a new container for every code change. However you are able to customize the Serverless runtime environment in various ways:
 
 - [Add dependencies](#add-dependencies)
-- [Use a different Python version](#python-version)
-- [Use a different base image](#base-image)
-- [Include data files](#data-files)
-- [Disable PEX deploys](#disable-pex)
-- [Use private Python packages](#private-packages)
+- [Use a different Python version](#use-a-different-python-version)
+- [Use a different base image](#use-a-different-base-image)
+- [Include data files](#include-data-files)
+- [Disable PEX deploys](#disable-pex-deploys)
+- [Use private Python packages](#use-private-python-packages)
 
-## Add dependencies \{#add-dependencies}
+## Add dependencies
 
 You can add dependencies by including the corresponding Python libraries in your Dagster project's `setup.py` file. These should follow [PEP 508](https://peps.python.org/pep-0508/).
 
@@ -39,9 +39,9 @@ setup(
 )
 ```
 
-To add a package from a private GitHub repository, see: [Use private Python packages](#private-packages)
+To add a package from a private GitHub repository, see [Use private Python packages](#use-private-python-packages)
 
-## Use a different Python version \{#python-version}
+## Use a different Python version
 
 The default Python version for Dagster+ Serverless is Python 3.9. Python versions 3.10 through 3.12 are also supported. You can specify the Python version you want to use in your GitHub or GitLab workflow, or by using the `dagster-cloud` CLI.
 
@@ -70,7 +70,7 @@ dagster-cloud serverless deploy-python-executable --python-version=3.11 --locati
 </TabItem>
 </Tabs>
 
-## Use a different base image \{#base-image}
+## Use a different base image
 
 Dagster+ runs your code on a Docker image that we build as follows:
 
@@ -117,7 +117,7 @@ Setting a custom base image isn't supported for GitLab CI/CD workflows out of th
     </TabItem>
     </Tabs>
 
-## Include data files \{#data-files}
+## Include data files
 
 To add data files to your deployment, use the [Data Files Support](https://setuptools.pypa.io/en/latest/userguide/datafiles.html) built into Python's `setup.py`. This requires adding a `package_data` or `include_package_data` keyword in the call to `setup()` in `setup.py`. For example, given this directory structure:
 
@@ -134,7 +134,7 @@ To add data files to your deployment, use the [Data Files Support](https://setup
 If you want to include the data folder, modify your `setup.py` to add the `package_data` line:
 <CodeExample filePath="dagster-plus/deployment/deployment-types/serverless/runtime-environment/data_files_setup.py" language="Python" title="Loading data files in setup.py" />
 
-## Disable PEX deploys \{#disable-pex}
+## Disable PEX deploys
 
 You have the option to disable PEX-based deploys and deploy using a Docker image instead of PEX. You can disable PEX in your GitHub or GitLab workflow, or by using the `dagster-cloud` CLI.
 
@@ -200,7 +200,7 @@ Setting a custom base image isn't supported for GitLab CI/CD workflows out of th
 </TabItem>
 </Tabs>
 
-## Use private Python packages \{#private-packages}
+## Use private Python packages
 
 If you use PEX deploys in your workflow (`ENABLE_FAST_DEPLOYS: 'true'`), the following steps can install a package from a private GitHub repository, e.g. `my-org/private-repo`, as a dependency:
 

diff --git a/.../docs-beta/docs/dagster-plus/deployment/deployment-types/serverless/security.md b/.../docs-beta/docs/dagster-plus/deployment/deployment-types/serverless/security.md
@@ -32,8 +32,6 @@ To prevent this, you can use [another I/O manager](/guides/build/configure/io-ma
 You must have [boto3](https://pypi.org/project/boto3/) or `dagster-cloud[serverless]` installed as a project dependency otherwise the Dagster+ managed storage can fail and silently fall back to using the default I/O manager.
 :::
 
-## Adding environment variables and secrets \{#adding-secrets}
+## Adding environment variables and secrets
 
 Often you'll need to securely access secrets from your jobs. Dagster+ supports several methods for adding secrets—refer to the [Dagster+ environment variables documentation](/dagster-plus/deployment/management/environment-variables) for more information.
-
----
diff --git a/docs/docs-beta/docs/dagster-plus/getting-started.md b/docs/docs-beta/docs/dagster-plus/getting-started.md
@@ -2,12 +2,16 @@
 title: "Getting started with Dagster+"
 ---
 
-First [create a Dagster+ organization](https://dagster.plus/signup). Note: you can sign up with:
+To get started with Dagster+, you will need to create a Dagster+ organization and choose your deployment type (Serverless or Hybrid).
+
+## Create a Dagster+ organization
+
+First, [create a Dagster+ organization](https://dagster.plus/signup). You can sign up with:
 - a Google email address
 - a GitHub account
-- a one-time email link, great if you are using a corporate email. You can setup SSO after completing these steps.
+- a one-time email link (ideal if you are using a corporate email). You can set up SSO after completing these steps.
 
-Next, pick your deployment type. Not sure?
+## Choose your deployment type
 
 - [Dagster+ Serverless](/dagster-plus/deployment/deployment-types/serverless) is the easiest way to get started and is great for teams with limited DevOps support. In Dagster+ Serverless, your Dagster code is executed in Dagster+. You will need to be okay [giving Dagster+ the credentials](/dagster-plus/deployment/management/environment-variables) to connect to the tools you want to orchestrate.
 
@@ -20,27 +24,25 @@ The remaining steps depend on your deployment type.
 
 We recommend following the steps in Dagster+ to add a new project.
 
-![Screenshot of Dagster+ serverless NUX](/img/placeholder.svg)
-
-The Dagster+ on-boarding will guide you through:
+The Dagster+ onboarding will guide you through:
 - creating a Git repository containing your Dagster code
 - setting up the necessary CI/CD actions to deploy that repository to Dagster+
 
 :::tip
-If you don't have any Dagster code yet, you will have the option to select an example quickstart project or import an existing dbt project
+If you don't have any Dagster code yet, you can select an example project or import an existing dbt project.
 :::
 
 See the guide on [adding code locations](/dagster-plus/deployment/code-locations) for details.
 </TabItem>
 
 <TabItem value="hybrid" label="Dagster+ Hybrid">
 
-## Install a Dagster+ Hybrid agent
+**Install a Dagster+ Hybrid agent**
 
-Follow [these guides](/dagster-plus/deployment/deployment-types/hybrid) for installing a Dagster+ Hybrid agent. Not sure which agent to pick? We recommend using the Dagster+ Kubernetes agent in most cases.
+Follow [these guides](/dagster-plus/deployment/deployment-types/hybrid) for installing a Dagster+ Hybrid agent. If you're not sure which agent to use, we recommend the [Dagster+ Kubernetes agent](/dagster-plus/deployment/deployment-types/hybrid/kubernetes/index.md) in most cases.
 
 
-## Setup CI/CD
+**Set up CI/CD**
 
 In most cases, your CI/CD process will be responsible for:
 - building your Dagster code into a Docker image

diff --git a/examples/project_atproto_dashboard/.env.example b/examples/project_atproto_dashboard/.env.example
@@ -0,0 +1,17 @@
+AWS_ENDPOINT_URL=
+AWS_ACCESS_KEY_ID=
+AWS_SECRET_ACCESS_KEY=
+AWS_BUCKET_NAME=
+AWS_ACCOUNT_ID=
+
+MOTHERDUCK_TOKEN=
+
+BSKY_LOGIN=
+BSKY_APP_PASSWORD=
+
+DBT_TARGET=
+
+AZURE_POWERBI_CLIENT_ID=
+AZURE_POWERBI_CLIENT_SECRET=
+AZURE_POWERBI_TENANT_ID=
+AZURE_POWERBI_WORKSPACE_ID=
diff --git a/examples/project_atproto_dashboard/.gitignore b/examples/project_atproto_dashboard/.gitignore
@@ -0,0 +1,5 @@
+tmp*/
+storage/
+schedules/
+history/
+atproto-session.txt
diff --git a/examples/project_atproto_dashboard/README.md b/examples/project_atproto_dashboard/README.md
@@ -0,0 +1,52 @@
+# project_atproto_dashboard
+
+An end-to-end demonstration of ingestion data from the ATProto API, modeling it with dbt, and presenting it with Power BI.
+
+![Architecture Diagram](./architecture-diagram.png)
+
+![Project asset lineage](./lineage.svg)
+
+## Features used
+
+1. Ingestion of data-related Bluesky posts
+   - Dynamic partitions
+   - Declarative automation
+   - Concurrency limits
+2. Modelling data using _dbt_
+3. Representing data in a dashboard
+
+## Getting started
+
+### Environment Setup
+
+Ensure the following environments have been populated in your `.env` file. Start by copying the
+template.
+
+```
+cp .env.example .env
+```
+
+And then populate the fields.
+
+### Development
+
+Install the project dependencies:
+
+    pip install -e ".[dev]"
+
+Start Dagster:
+
+    DAGSTER_HOME=$(pwd) dagster dev
+
+### Unit testing
+
+Tests are in the `project_atproto_dashboard_tests` directory and you can run tests using `pytest`:
+
+    pytest project_atproto_dashboard_tests
+
+## Resources
+
+- https://docs.bsky.app/docs/tutorials/viewing-feeds
+- https://docs.bsky.app/docs/advanced-guides/rate-limits
+- https://atproto.blue/en/latest/atproto_client/auth.html#session-string
+- https://tenacity.readthedocs.io/en/latest/#waiting-before-retrying
diff --git a/examples/project_atproto_dashboard/architecture-diagram.png b/examples/project_atproto_dashboard/architecture-diagram.png
diff --git a/examples/project_atproto_dashboard/dagster.yaml b/examples/project_atproto_dashboard/dagster.yaml
@@ -0,0 +1,6 @@
+run_coordinator:
+  module: dagster.core.run_coordinator
+  class: QueuedRunCoordinator
+
+concurrency:
+  default_op_concurrency_limit: 1
diff --git a/examples/project_atproto_dashboard/dbt_project/.gitignore b/examples/project_atproto_dashboard/dbt_project/.gitignore
diff --git a/examples/project_atproto_dashboard/dbt_project/.sqlfluff b/examples/project_atproto_dashboard/dbt_project/.sqlfluff
@@ -0,0 +1,2 @@
+[sqlfluff:rules:capitalisation.keywords]
+capitalisation_policy = upper
diff --git a/examples/project_atproto_dashboard/dbt_project/dbt_project.yml b/examples/project_atproto_dashboard/dbt_project/dbt_project.yml
@@ -0,0 +1,13 @@
+name: "dbt_project"
+version: "1.0.0"
+config-version: 2
+
+profile: "bluesky"
+
+target-path: "target"
+clean-targets:
+  - "target"
+  - "dbt_packages"
+
+models:
+  +materialized: table
diff --git a/examples/project_atproto_dashboard/dbt_project/models/analysis/activity_over_time.sql b/examples/project_atproto_dashboard/dbt_project/models/analysis/activity_over_time.sql
@@ -0,0 +1,14 @@
+WITH final AS (
+    SELECT
+        date_trunc('day', created_at) AS post_date,
+        count(DISTINCT post_text) AS unique_posts,
+        count(DISTINCT author_handle) AS active_authors,
+        sum(likes) AS total_likes,
+        sum(replies) AS total_comments,
+        sum(quotes) AS total_quotes
+    FROM {{ ref("latest_feed") }}
+    GROUP BY date_trunc('day', created_at)
+    ORDER BY date_trunc('day', created_at) DESC
+)
+
+SELECT * FROM final
Original file line number	Diff line number	Diff line change
Expand Up		@@ -83,5 +83,3 @@ dagster-cloud serverless deploy-python-executable ./my-dagster-project \

		</TabItem>
		</Tabs>

		---
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		[sqlfluff:rules:capitalisation.keywords]
		capitalisation_policy = upper