-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[components] [docs] Initial components guide (#26651)
## Summary & Motivation This is a draft of the initial docs page for Dagster components. It is a guide nested under the "Build" section. It currently contains just the first part of the tutorial. It will be expanded/adjusted in multiple followups. ## How I Tested These Changes Browsed preview.
- Loading branch information
Showing
1 changed file
with
202 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,202 @@ | ||
--- | ||
title: "Components" | ||
sidebar_position: 200 | ||
--- | ||
|
||
Welcome to Dagster Components. | ||
|
||
Dagster Components is a new way to structure your Dagster projects. It aims to provide: | ||
|
||
- An opinionated project layout that supports ongoing scaffolding from “Hello world” to the most advanced projects | ||
- A class-based interface for dynamically constructing definitions | ||
- A toolkit to build YAML DSL frontends for components so that components can be constructed in a low-code fashion. | ||
- A format for components to provide their own scaffolding, in order to organize and reference integration-specific artifacts files. | ||
|
||
## Project Setup | ||
|
||
First let's install the `dg` command line tool. This lives in the published Python package `dagster-dg`. `dg` is designed to be globally installed and has no dependency on `dagster` itself. We will use the tool feature of Python package manager `uv` to install a globally available `dg`. `dg` will also be use `uv` internally to manage the python enviroment associated with your project. | ||
|
||
```bash | ||
brew install uv && uv tool install -e -e $DAGSTER_GIT_REPO_DIR/python_modules/libraries/dagster-dg/ | ||
``` | ||
|
||
Let's have a look at what's available: | ||
|
||
```bash | ||
dg --help | ||
|
||
Usage: dg [OPTIONS] COMMAND [ARGS]... | ||
|
||
CLI tools for working with Dagster components. | ||
|
||
Commands: | ||
code-location Commands for operating code location directories. | ||
component Commands for operating on components. | ||
component-type Commands for operating on components types. | ||
deployment Commands for operating on deployment directories. | ||
|
||
Options: | ||
--builtin-component-lib TEXT Specify a builitin component library to use. | ||
--verbose Enable verbose output for debugging. | ||
--disable-cache Disable caching of component registry data. | ||
--clear-cache Clear the cache before running the command. | ||
--rebuild-component-registry Recompute and cache the set of available component types for the current environment. | ||
Note that this also happens automatically whenever the cache is detected to be stale. | ||
--cache-dir PATH Specify a directory to use for the cache. | ||
-v, --version Show the version and exit. | ||
-h, --help Show this message and exit. | ||
``` | ||
|
||
We're going to generate a new code location. | ||
|
||
```bash | ||
dg code-location generate jaffle_platform | ||
``` | ||
|
||
Let's have a look at what it generated: | ||
|
||
```bash | ||
cd jaffle_platform && tree | ||
``` | ||
|
||
You can see that we have a basic project structure with a few non-standard files/directories: | ||
|
||
- `jaffle_platform/components`: this is where we will define our components | ||
- `jaffle_platform/lib`: this is where we can put custom component types | ||
- `definitions.py`: this comes preloaded with some basic code that will scrape up and merge all the Dagster definitions from our components. | ||
|
||
## Hello Platform | ||
|
||
We are going to set up a data platform using sling to ingest data, dbt to process the data, and python to do AI. | ||
|
||
### Ingest | ||
|
||
First we set up sling. If we query the available component-types in our code location, we don't see anything sling-related: | ||
|
||
```bash | ||
dg component-type list | ||
|
||
dagster_components.pipes_subprocess_script_collection | ||
Assets that wrap Python scripts executed with Dagster's PipesSubprocessClient. | ||
``` | ||
This is because the basic `dagster-components` package is lightweight and doesn't include copmonents for specific tools. We can get access to a `sling` component by installing the `sling` extra: | ||
|
||
```bash | ||
uv add 'dagster-components[sling]' dagster-embedded-elt | ||
``` | ||
|
||
Now let's see what's available: | ||
|
||
```bash | ||
dg component-type list | ||
dagster_components.pipes_subprocess_script_collection | ||
Assets that wrap Python scripts executed with Dagster's PipesSubprocessClient. | ||
dagster_components.sling_replication` | ||
``` | ||
Great-- we now have the `dagster_components.sling_replication` component type available. Let's create a new instance of this component: | ||
```bash | ||
dg component generate dagster_components.sling_replication ingest_files | ||
|
||
Creating a Dagster component instance folder at /Users/smackesey/stm/code/elementl/tmp/jaffle_platform/jaffle_platform/components/ingest_files. | ||
``` | ||
|
||
This adds a component instance to the project at `jaffle_platform/components/ingest_files`: | ||
|
||
```bash | ||
tree jaffle_platform | ||
|
||
jaffle_platform/ | ||
├── __init__.py | ||
├── __pycache__ | ||
│ └── __init__.cpython-312.pyc | ||
├── components | ||
│ └── ingest_files | ||
│ ├── component.yaml | ||
│ └── replication.yaml | ||
├── definitions.py | ||
└── lib | ||
├── __init__.py | ||
└── __pycache__ | ||
└── __init__.cpython-312.pyc | ||
|
||
6 directories, 7 files | ||
``` | ||
|
||
Notice that our component has two files: `component.yaml` and `replication.yaml`. The `component.yaml` file is common to all Dagster components, and specifies the component type and any associated parameters. Right now the parameters are empty: | ||
|
||
```yaml | ||
### jaffle_platform/components/ingest_files/component.yaml | ||
component_type: dagster_components.sling_replication | ||
|
||
params: {} | ||
``` | ||
The `replication.yaml` file is a sling-specific file. | ||
|
||
We want to replicate data on the public internet into duckdb: | ||
|
||
```bash | ||
uv run sling conns set DUCKDB type=duckdb instance=/tmp/jaffle_platform.duckdb | ||
4:55PM INF connection `DUCKDB` has been set in /Users/smackesey/.sling/env.yaml. Please test with `sling conns test DUCKDB` | ||
``` | ||
|
||
```bash | ||
uv run sling conns test DUCKDB | ||
|
||
4:55PM INF success! | ||
``` | ||
|
||
Now let's download a file locally (sling doesn’t support reading from the public internet): | ||
|
||
```bash | ||
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_customers.csv && | ||
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_orders.csv && | ||
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_payments.csv | ||
``` | ||
|
||
And copy-paste the below code into `replication.yaml`: | ||
|
||
```yaml | ||
source: LOCAL | ||
target: DUCKDB | ||
|
||
defaults: | ||
mode: full-refresh | ||
object: "{stream_table}" | ||
|
||
streams: | ||
file://raw_customers.csv: | ||
object: "main.raw_customers" | ||
file://raw_orders.csv: | ||
object: "main.raw_orders" | ||
file://raw_payments.csv: | ||
object: "main.raw_payments" | ||
``` | ||
Let's load up our code location in the Dagster UI to see what we've got: | ||
```bash | ||
uv run dagster dev # will be dg dev in the future | ||
``` | ||
|
||
Click "Materialize All", and we should now have tables in the DuckDB instance. Let's verify on the command line: | ||
|
||
``` | ||
brew install duckdb | ||
duckdb /tmp/jaffle_platform.duckdb -c "SELECT * FROM raw_customers LIMIT 5;" | ||
┌───────┬────────────┬───────────┬──────────────────┐ | ||
│ id │ first_name │ last_name │ _sling_loaded_at │ | ||
│ int32 │ varchar │ varchar │ int64 │ | ||
├───────┼────────────┼───────────┼──────────────────┤ | ||
│ 1 │ Michael │ P. │ 1734732030 │ | ||
│ 2 │ Shawn │ M. │ 1734732030 │ | ||
│ 3 │ Kathleen │ P. │ 1734732030 │ | ||
│ 4 │ Jimmy │ C. │ 1734732030 │ | ||
│ 5 │ Katherine │ R. │ 1734732030 │ | ||
└───────┴────────────┴───────────┴──────────────────┘ | ||
``` |
1e02550
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deploy preview for dagster-docs-beta ready!
✅ Preview
https://dagster-docs-beta-4eov3vtdf-elementl.vercel.app
Built with commit 1e02550.
This pull request is being automatically deployed with vercel-action