Skip to content

Commit

Permalink
Merge branch 'master' into nikki/docs/hybrid-docs
Browse files Browse the repository at this point in the history
  • Loading branch information
neverett committed Dec 17, 2024
2 parents 9e5f445 + 809d837 commit 4967813
Show file tree
Hide file tree
Showing 78 changed files with 2,428 additions and 514 deletions.
2 changes: 1 addition & 1 deletion docs/content/deployment/run-monitoring.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ When Dagster terminates a run, the run moves into CANCELING status and sends a t

## General run timeouts

After a run is marked as STARTED, it may hang indefinitely for various reasons (user API errors, network issues, etc.). You can configure a maximum runtime for every run in a deployment by setting the `run_monitoring.max_runtime_seconds` field in your dagster.yaml or (Dagster+ deployment settings)\[dagster-plus/managing-deployments/deployment-settings-reference] to the maximum runtime in seconds. If a run exceeds this timeout and run monitoring is enabled, it will be marked as failed. The `dagster/max_runtime` tag can also be used to set a timeout in seconds on a per-run basis.
After a run is marked as STARTED, it may hang indefinitely for various reasons (user API errors, network issues, etc.). You can configure a maximum runtime for every run in a deployment by setting the `run_monitoring.max_runtime_seconds` field in your dagster.yaml or [Dagster+ deployment settings](/dagster-plus/managing-deployments/deployment-settings-reference) to the maximum runtime in seconds. If a run exceeds this timeout and run monitoring is enabled, it will be marked as failed. The `dagster/max_runtime` tag can also be used to set a timeout in seconds on a per-run basis.

For example, to configure a maximum of 2 hours for every run in your deployment:

Expand Down
23 changes: 15 additions & 8 deletions docs/docs-beta/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,20 +80,27 @@ Before:

```
<ReferenceTable>
<ReferenceTableItem propertyName="container_context.ecs.env_vars">
A list of keys or key-value pairs to include in the task. If a value is not
specified, the value will be pulled from the agent task.
<br />
In the example above, <code>FOO_ENV_VAR</code> will be set to{" "}
<code>foo_value</code> and <code>BAR_ENV_VAR</code> will be set to whatever
value it has in the agent task.
<ReferenceTableItem propertyName="DAGSTER_CLOUD_DEPLOYMENT_NAME">
The name of the Dagster+ deployment. For example, <code>prod</code>.
</ReferenceTableItem>
<ReferenceTableItem propertyName="DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT">
If <code>1</code>, the deployment is a{" "}
<a href="/dagster-plus/managing-deployments/branch-deployments">
branch deployment
</a>
. Refer to the <a href="#reserved-branch-deployment-variables">
Branch Deployment variables section
</a> for a list of variables available in branch deployments.
</ReferenceTableItem>
</ReferenceTable>
```

After:

_There is not a replacement at this point in time..._
| Key | Value |
|---|---|
| `DAGSTER_CLOUD_DEPLOYMENT_NAME` | The name of the Dagster+ deployment. <br/><br/> **Example:** `prod`. |
| `DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT` | `1` if the deployment is a [branch deployment](/dagster-plus/features/ci-cd/branch-deployments/index.md). |

### Whitespace via `{" "}`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,5 +83,3 @@ dagster-cloud serverless deploy-python-executable ./my-dagster-project \

</TabItem>
</Tabs>

---
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@ sidebar_position: 10

Dagster+ Serverless is a fully managed version of Dagster+ and is the easiest way to get started with Dagster. With a Serverless deployment, you can run your Dagster jobs without spinning up any infrastructure yourself.

---

## When to choose Serverless \{#when-to-choose-serverless}
## Serverless vs Hybrid

Serverless works best with workloads that primarily orchestrate other services or perform light computation. Most workloads fit into this category, especially those that orchestrate third-party SaaS products like cloud data warehouses and ETL tools.

Expand All @@ -21,9 +19,7 @@ If any of the following are applicable, you should select [Hybrid deployment](/d
- You need to distribute computation across many nodes for a single run. Dagster+ runs currently execute on a single node with 4 CPUs
- You don't want to add Dagster Labs as a data processor

---

## Limitations \{#limitations}
## Limitations

Serverless is subject to the following limitations:

Expand All @@ -36,8 +32,6 @@ Serverless is subject to the following limitations:

Dagster+ Pro customers may request a quota increase by [contacting Sales](https://dagster.io/contact).

---

## Next steps

To start using Dagster+ Serverless, follow our [Getting started with Dagster+](/dagster-plus/getting-started) guide.
To start using Dagster+ Serverless, follow the steps in [Getting started with Dagster+](/dagster-plus/getting-started).
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@ To follow the steps in this guide, you'll need:
- An understanding of [Dagster+ deployment settings](/dagster-plus/deployment/management/settings/deployment-settings)
</details>

---

## Differences between isolated and non-isolated runs

- [**Isolated runs**](#isolated-runs-default) execute in their own container. They're the default and are intended for production and compute-heavy use cases.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ sidebar_position: 100
By default, Dagster+ Serverless will package your code as PEX files and deploys them on Docker images. Using PEX files significantly reduces the time to deploy since it does not require building a new Docker image and provisioning a new container for every code change. However you are able to customize the Serverless runtime environment in various ways:

- [Add dependencies](#add-dependencies)
- [Use a different Python version](#python-version)
- [Use a different base image](#base-image)
- [Include data files](#data-files)
- [Disable PEX deploys](#disable-pex)
- [Use private Python packages](#private-packages)
- [Use a different Python version](#use-a-different-python-version)
- [Use a different base image](#use-a-different-base-image)
- [Include data files](#include-data-files)
- [Disable PEX deploys](#disable-pex-deploys)
- [Use private Python packages](#use-private-python-packages)

## Add dependencies \{#add-dependencies}
## Add dependencies

You can add dependencies by including the corresponding Python libraries in your Dagster project's `setup.py` file. These should follow [PEP 508](https://peps.python.org/pep-0508/).

Expand All @@ -39,9 +39,9 @@ setup(
)
```

To add a package from a private GitHub repository, see: [Use private Python packages](#private-packages)
To add a package from a private GitHub repository, see [Use private Python packages](#use-private-python-packages)

## Use a different Python version \{#python-version}
## Use a different Python version

The default Python version for Dagster+ Serverless is Python 3.9. Python versions 3.10 through 3.12 are also supported. You can specify the Python version you want to use in your GitHub or GitLab workflow, or by using the `dagster-cloud` CLI.

Expand Down Expand Up @@ -70,7 +70,7 @@ dagster-cloud serverless deploy-python-executable --python-version=3.11 --locati
</TabItem>
</Tabs>

## Use a different base image \{#base-image}
## Use a different base image

Dagster+ runs your code on a Docker image that we build as follows:

Expand Down Expand Up @@ -117,7 +117,7 @@ Setting a custom base image isn't supported for GitLab CI/CD workflows out of th
</TabItem>
</Tabs>
## Include data files \{#data-files}
## Include data files
To add data files to your deployment, use the [Data Files Support](https://setuptools.pypa.io/en/latest/userguide/datafiles.html) built into Python's `setup.py`. This requires adding a `package_data` or `include_package_data` keyword in the call to `setup()` in `setup.py`. For example, given this directory structure:
Expand All @@ -134,7 +134,7 @@ To add data files to your deployment, use the [Data Files Support](https://setup
If you want to include the data folder, modify your `setup.py` to add the `package_data` line:
<CodeExample filePath="dagster-plus/deployment/deployment-types/serverless/runtime-environment/data_files_setup.py" language="Python" title="Loading data files in setup.py" />
## Disable PEX deploys \{#disable-pex}
## Disable PEX deploys
You have the option to disable PEX-based deploys and deploy using a Docker image instead of PEX. You can disable PEX in your GitHub or GitLab workflow, or by using the `dagster-cloud` CLI.
Expand Down Expand Up @@ -200,7 +200,7 @@ Setting a custom base image isn't supported for GitLab CI/CD workflows out of th
</TabItem>
</Tabs>

## Use private Python packages \{#private-packages}
## Use private Python packages

If you use PEX deploys in your workflow (`ENABLE_FAST_DEPLOYS: 'true'`), the following steps can install a package from a private GitHub repository, e.g. `my-org/private-repo`, as a dependency:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@ To prevent this, you can use [another I/O manager](/guides/build/configure/io-ma
You must have [boto3](https://pypi.org/project/boto3/) or `dagster-cloud[serverless]` installed as a project dependency otherwise the Dagster+ managed storage can fail and silently fall back to using the default I/O manager.
:::

## Adding environment variables and secrets \{#adding-secrets}
## Adding environment variables and secrets

Often you'll need to securely access secrets from your jobs. Dagster+ supports several methods for adding secrets—refer to the [Dagster+ environment variables documentation](/dagster-plus/deployment/management/environment-variables) for more information.

---
22 changes: 12 additions & 10 deletions docs/docs-beta/docs/dagster-plus/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@
title: "Getting started with Dagster+"
---

First [create a Dagster+ organization](https://dagster.plus/signup). Note: you can sign up with:
To get started with Dagster+, you will need to create a Dagster+ organization and choose your deployment type (Serverless or Hybrid).

## Create a Dagster+ organization

First, [create a Dagster+ organization](https://dagster.plus/signup). You can sign up with:
- a Google email address
- a GitHub account
- a one-time email link, great if you are using a corporate email. You can setup SSO after completing these steps.
- a one-time email link (ideal if you are using a corporate email). You can set up SSO after completing these steps.

Next, pick your deployment type. Not sure?
## Choose your deployment type

- [Dagster+ Serverless](/dagster-plus/deployment/deployment-types/serverless) is the easiest way to get started and is great for teams with limited DevOps support. In Dagster+ Serverless, your Dagster code is executed in Dagster+. You will need to be okay [giving Dagster+ the credentials](/dagster-plus/deployment/management/environment-variables) to connect to the tools you want to orchestrate.

Expand All @@ -20,27 +24,25 @@ The remaining steps depend on your deployment type.

We recommend following the steps in Dagster+ to add a new project.

![Screenshot of Dagster+ serverless NUX](/img/placeholder.svg)

The Dagster+ on-boarding will guide you through:
The Dagster+ onboarding will guide you through:
- creating a Git repository containing your Dagster code
- setting up the necessary CI/CD actions to deploy that repository to Dagster+

:::tip
If you don't have any Dagster code yet, you will have the option to select an example quickstart project or import an existing dbt project
If you don't have any Dagster code yet, you can select an example project or import an existing dbt project.
:::

See the guide on [adding code locations](/dagster-plus/deployment/code-locations) for details.
</TabItem>

<TabItem value="hybrid" label="Dagster+ Hybrid">

## Install a Dagster+ Hybrid agent
**Install a Dagster+ Hybrid agent**

Follow [these guides](/dagster-plus/deployment/deployment-types/hybrid) for installing a Dagster+ Hybrid agent. Not sure which agent to pick? We recommend using the Dagster+ Kubernetes agent in most cases.
Follow [these guides](/dagster-plus/deployment/deployment-types/hybrid) for installing a Dagster+ Hybrid agent. If you're not sure which agent to use, we recommend the [Dagster+ Kubernetes agent](/dagster-plus/deployment/deployment-types/hybrid/kubernetes/index.md) in most cases.


## Setup CI/CD
**Set up CI/CD**

In most cases, your CI/CD process will be responsible for:
- building your Dagster code into a Docker image
Expand Down
17 changes: 17 additions & 0 deletions examples/project_atproto_dashboard/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
AWS_ENDPOINT_URL=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_BUCKET_NAME=
AWS_ACCOUNT_ID=

MOTHERDUCK_TOKEN=

BSKY_LOGIN=
BSKY_APP_PASSWORD=

DBT_TARGET=

AZURE_POWERBI_CLIENT_ID=
AZURE_POWERBI_CLIENT_SECRET=
AZURE_POWERBI_TENANT_ID=
AZURE_POWERBI_WORKSPACE_ID=
5 changes: 5 additions & 0 deletions examples/project_atproto_dashboard/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
tmp*/
storage/
schedules/
history/
atproto-session.txt
52 changes: 52 additions & 0 deletions examples/project_atproto_dashboard/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# project_atproto_dashboard

An end-to-end demonstration of ingestion data from the ATProto API, modeling it with dbt, and presenting it with Power BI.

![Architecture Diagram](./architecture-diagram.png)

![Project asset lineage](./lineage.svg)

## Features used

1. Ingestion of data-related Bluesky posts
- Dynamic partitions
- Declarative automation
- Concurrency limits
2. Modelling data using _dbt_
3. Representing data in a dashboard

## Getting started

### Environment Setup

Ensure the following environments have been populated in your `.env` file. Start by copying the
template.

```
cp .env.example .env
```

And then populate the fields.

### Development

Install the project dependencies:

pip install -e ".[dev]"

Start Dagster:

DAGSTER_HOME=$(pwd) dagster dev

### Unit testing

Tests are in the `project_atproto_dashboard_tests` directory and you can run tests using `pytest`:

pytest project_atproto_dashboard_tests

## Resources

- https://docs.bsky.app/docs/tutorials/viewing-feeds
- https://docs.bsky.app/docs/advanced-guides/rate-limits
- https://atproto.blue/en/latest/atproto_client/auth.html#session-string
- https://tenacity.readthedocs.io/en/latest/#waiting-before-retrying
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions examples/project_atproto_dashboard/dagster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
run_coordinator:
module: dagster.core.run_coordinator
class: QueuedRunCoordinator

concurrency:
default_op_concurrency_limit: 1
Empty file.
2 changes: 2 additions & 0 deletions examples/project_atproto_dashboard/dbt_project/.sqlfluff
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = upper
13 changes: 13 additions & 0 deletions examples/project_atproto_dashboard/dbt_project/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: "dbt_project"
version: "1.0.0"
config-version: 2

profile: "bluesky"

target-path: "target"
clean-targets:
- "target"
- "dbt_packages"

models:
+materialized: table
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
WITH final AS (
SELECT
date_trunc('day', created_at) AS post_date,
count(DISTINCT post_text) AS unique_posts,
count(DISTINCT author_handle) AS active_authors,
sum(likes) AS total_likes,
sum(replies) AS total_comments,
sum(quotes) AS total_quotes
FROM {{ ref("latest_feed") }}
GROUP BY date_trunc('day', created_at)
ORDER BY date_trunc('day', created_at) DESC
)

SELECT * FROM final
Loading

0 comments on commit 4967813

Please sign in to comment.