Skip to content

Commit

Permalink
[components] [docs] Initial components guide
Browse files Browse the repository at this point in the history
  • Loading branch information
smackesey committed Dec 20, 2024
1 parent 8d466fd commit bef1e43
Showing 1 changed file with 202 additions and 0 deletions.
202 changes: 202 additions & 0 deletions docs/docs-beta/docs/guides/build/components.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
---
title: "Components"
sidebar_position: 200
---

Welcome to Dagster Components.

Dagster Components is a new way to structure your Dagster projects. It aims to provide:

- An opinionated project layout that supports ongoing scaffolding from “Hello world” to the most advanced projects

Check warning on line 10 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.chars-non-standard-quotes] Use standard single quotes or double quotes only. Do not use left or right quotes. Raw Output: {"message": "[Dagster.chars-non-standard-quotes] Use standard single quotes or double quotes only. Do not use left or right quotes.", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 10, "column": 72}}}, "severity": "WARNING"}

Check warning on line 10 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.chars-non-standard-quotes] Use standard single quotes or double quotes only. Do not use left or right quotes. Raw Output: {"message": "[Dagster.chars-non-standard-quotes] Use standard single quotes or double quotes only. Do not use left or right quotes.", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 10, "column": 84}}}, "severity": "WARNING"}
- A class-based interface for dynamically constructing definitions
- A toolkit to build YAML DSL frontends for components so that components can be constructed in a low-code fashion.
- A format for components to provide their own scaffolding, in order to organize and reference integration-specific artifacts files.

Check warning on line 13 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Terms.words] Use 'to' instead of 'in order to'. Raw Output: {"message": "[Terms.words] Use 'to' instead of 'in order to'.", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 13, "column": 61}}}, "severity": "WARNING"}

## Project Setup

First let's install the `dg` command line tool. This lives in the published Python package `dagster-dg`. `dg` is designed to be globally installed and has no dependency on `dagster` itself. We will use the [tool]() feature of Python package manager `uv` to install a globally available `dg`. `dg` will also be use `uv` internally to manage the python enviroment associated with your project.

Check warning on line 17 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Terms.engineering] Use 'Python' instead of 'python'. Raw Output: {"message": "[Terms.engineering] Use 'Python' instead of 'python'.", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 17, "column": 345}}}, "severity": "WARNING"}

Check failure on line 17 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'enviroment'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'enviroment'?", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 17, "column": 352}}}, "severity": "ERROR"}

Check failure on line 17 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'enviroment' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'enviroment' spelled correctly?", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 17, "column": 352}}}, "severity": "ERROR"}

```bash
brew install uv && uv tool install -e -e $DAGSTER_GIT_REPO_DIR/python_modules/libraries/dagster-dg/
```

Let's have a look at what's available:

```bash
dg --help

Usage: dg [OPTIONS] COMMAND [ARGS]...

CLI tools for working with Dagster components.

Commands:
code-location Commands for operating code location directories.
component Commands for operating on components.
component-type Commands for operating on components types.
deployment Commands for operating on deployment directories.

Options:
--builtin-component-lib TEXT Specify a builitin component library to use.
--verbose Enable verbose output for debugging.
--disable-cache Disable caching of component registry data.
--clear-cache Clear the cache before running the command.
--rebuild-component-registry Recompute and cache the set of available component types for the current environment.
Note that this also happens automatically whenever the cache is detected to be stale.
--cache-dir PATH Specify a directory to use for the cache.
-v, --version Show the version and exit.
-h, --help Show this message and exit.
```

We're going to generate a new code location.

```bash
dg code-location generate jaffle_platform
```

Let's have a look at what it generated:

```bash
cd jaffle_platform && tree
```

You can see that we have a basic project structure with a few non-standard files/directories:

- `jaffle_platform/components`: this is where we will define our components
- `jaffle_platform/lib`: this is where we can put custom component types
- `definitions.py`: this comes preloaded with some basic code that will scrape up and merge all the Dagster definitions from our components.

## Hello Platform

We are going to set up a data platform using sling to ingest data, dbt to process the data, and python to do AI.

Check warning on line 70 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Terms.engineering] Use 'Python' instead of 'python'. Raw Output: {"message": "[Terms.engineering] Use 'Python' instead of 'python'.", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 70, "column": 97}}}, "severity": "WARNING"}

### Ingest

First we set up sling. If we query the available component-types in our code location, we don't see anything sling-related:

```bash
dg component-type list

dagster_components.pipes_subprocess_script_collection
Assets that wrap Python scripts executed with Dagster's PipesSubprocessClient.
```
This is because the basic `dagster-components` package is lightweight and doesn't include copmonents for specific tools. We can get access to a `sling` component by installing the `sling` extra:

Check failure on line 83 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'copmonents' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'copmonents' spelled correctly?", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 83, "column": 91}}}, "severity": "ERROR"}

Check failure on line 83 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'copmonents'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'copmonents'?", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 83, "column": 91}}}, "severity": "ERROR"}

```bash
uv add 'dagster-components[sling]' dagster-embedded-elt
```

Now let's see what's available:

```bash
dg component-type list
dagster_components.pipes_subprocess_script_collection
Assets that wrap Python scripts executed with Dagster's PipesSubprocessClient.
dagster_components.sling_replication`
```
Great-- we now have the `dagster_components.sling_replication` component type available. Let's create a new instance of this component:
```bash
dg component generate dagster_components.sling_replication ingest_files

Creating a Dagster component instance folder at /Users/smackesey/stm/code/elementl/tmp/jaffle_platform/jaffle_platform/components/ingest_files.
```

This adds a component instance to the project at `jaffle_platform/components/ingest_files`:

```bash
tree jaffle_platform

jaffle_platform/
├── __init__.py
├── __pycache__
│   └── __init__.cpython-312.pyc
├── components
│   └── ingest_files
│   ├── component.yaml
│   └── replication.yaml
├── definitions.py
└── lib
├── __init__.py
└── __pycache__
└── __init__.cpython-312.pyc

6 directories, 7 files
```

Notice that our component has two files: `component.yaml` and `replication.yaml`. The `component.yaml` file is common to all Dagster components, and specifies the component type and any associated parameters. Right now the parameters are empty:

```yaml
### jaffle_platform/components/ingest_files/component.yaml
component_type: dagster_components.sling_replication

params: {}
```
The `replication.yaml` file is a sling-specific file.

We want to replicate data on the public internet into duckdb:

Check failure on line 140 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'DuckDB' instead of 'duckdb'. Raw Output: {"message": "[Vale.Terms] Use 'DuckDB' instead of 'duckdb'.", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 140, "column": 55}}}, "severity": "ERROR"}

```bash
uv run sling conns set DUCKDB type=duckdb instance=/tmp/jaffle_platform.duckdb
4:55PM INF connection `DUCKDB` has been set in /Users/smackesey/.sling/env.yaml. Please test with `sling conns test DUCKDB`
```

```bash
uv run sling conns test DUCKDB

4:55PM INF success!
```

Now let's download a file locally (sling doesn’t support reading from the public internet):

Check warning on line 154 in docs/docs-beta/docs/guides/build/components.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.chars-non-standard-quotes] Use standard single quotes or double quotes only. Do not use left or right quotes. Raw Output: {"message": "[Dagster.chars-non-standard-quotes] Use standard single quotes or double quotes only. Do not use left or right quotes.", "location": {"path": "docs/docs-beta/docs/guides/build/components.md", "range": {"start": {"line": 154, "column": 47}}}, "severity": "WARNING"}

```bash
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_customers.csv &&
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_orders.csv &&
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_payments.csv
```

And copy-paste the below code into `replication.yaml`:

```yaml
source: LOCAL
target: DUCKDB

defaults:
mode: full-refresh
object: "{stream_table}"

streams:
file://raw_customers.csv:
object: "main.raw_customers"
file://raw_orders.csv:
object: "main.raw_orders"
file://raw_payments.csv:
object: "main.raw_payments"
```
Let's load up our code location in the Dagster UI to see what we've got:
```bash
uv run dagster dev # will be dg dev in the future
```

Click "Materialize All", and we should now have tables in the DuckDB instance. Let's verify on the command line:

```
brew install duckdb
duckdb /tmp/jaffle_platform.duckdb -c "SELECT * FROM raw_customers LIMIT 5;"
┌───────┬────────────┬───────────┬──────────────────┐
│ id │ first_name │ last_name │ _sling_loaded_at │
│ int32 │ varchar │ varchar │ int64 │
├───────┼────────────┼───────────┼──────────────────┤
│ 1 │ Michael │ P. │ 1734732030 │
│ 2 │ Shawn │ M. │ 1734732030 │
│ 3 │ Kathleen │ P. │ 1734732030 │
│ 4 │ Jimmy │ C. │ 1734732030 │
│ 5 │ Katherine │ R. │ 1734732030 │
└───────┴────────────┴───────────┴──────────────────┘
```

0 comments on commit bef1e43

Please sign in to comment.