Skip to content

Commit

Permalink
blueprint READMEs (#22736)
Browse files Browse the repository at this point in the history
## Summary & Motivation

## How I Tested These Changes
  • Loading branch information
sryza authored Jun 27, 2024
1 parent 00d2941 commit 5945a36
Show file tree
Hide file tree
Showing 3 changed files with 122 additions and 1 deletion.
48 changes: 47 additions & 1 deletion examples/experimental/dagster-blueprints/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,49 @@
# dagster-blueprints

Experimental low-code package for building Dagster Definitions using YAML files.
Dagster’s core definition Python APIs are infinitely flexible. This flexibility is vital for developers facing a wide set of data engineering problems. However, it’s also often dizzying and intimidating to people who need to contribute to data pipelines, but aren’t full time data engineers or experts in Dagster. For example, a data analyst who wants to pull in some data from a new data source, or a scientist who wants to wire up some preexisting shell scripts into a pipeline.

"Blueprints" is a layer built on top of these APIs, that helps offer guard rails and simpler interfaces for contributing to data pipelines. Blueprints are intended to be used for common repetitive pipeline authoring tasks, such as:

- Configuring a data sync of a common type
- Putting a shell script on a schedule
- Putting a Databricks notebook on a schedule

A blueprint is a simple blob of data that describes how to construct one or more Dagster definitions. Because blueprints are simple blobs of data, they can be authored in YAML. In the future, they'll also be able to be authored in JSON, and perhaps even in Dagster's UI.

![image](https://github.com/dagster-io/dagster/assets/654855/d65c9db3-cf1f-4a0f-a5aa-63be36e99076)

Blueprints are intended to be heavily customized within an organization. While Dagster provides some blueprint types out of the box, the expectation is that data platform engineers will write Python code to curate and develop the set of blueprint types that their stakeholders have access to.

![image](https://github.com/dagster-io/dagster/assets/654855/660983f4-a581-4094-8f66-c8a95e4299c3)

## Blueprints vs. parsing YAML on your own

Why use Blueprints when you can write your own code to parse YAML and generate Dagster definitions?

- Schematized – Blueprints are typed using Pydantic classes. This enables the Dagster blueprints library to offer utilities that streamline YAML/JSON development:
- Generate configuration for VS Code that offers typeahead and type-checking for YAML.
- High quality errors when values don’t conform to types, linked to positions in the source YAML file.
- Code references – Blueprints automatically attach metadata that link definitions to the YAML they were generated from.
- Built-ins – take advantage of built-in blueprint types and use them seamlessly alongside your own custom types.

## How to try out Blueprints

### Install

Clone the Dagster repo:

```python
git clone https://github.com/dagster-io/dagster.git
```

Install the dagster-blueprints package, as well as the HEAD version of Dagster:

```python
pip install -e python_modules/dagster
pip install -e examples/experimental/dagster-blueprints/
```

### Try out one of the examples:

- [Built-in blueprints](examples/builtin-blueprints)
- [Custom blueprints](examples/custom-blueprints)
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
## Example: using the built-in `ShellCommmandBlueprint` to build pipelines of shell commands in YAML

Imagine that people in your organization work heavily with shell scripts that process data, and you want them to write simple YAML files that compose these shell scripts into data pipelines that can be viewed and executed in Dagster.

The built-in `ShellCommandBlueprint` can help with this. A `ShellCommandBlueprint` is a blueprint for one or more assets whose shared materialization function is a shell command. It accepts two fields:

- `command`: The shell command, given as a string.
- `assets`: Specs for the assets that are generated by the shell command. Each spec includes fields like `key` and `deps`.

Using it involves two kinds of files.

- The YAML files themselves, which contain blobs that conform to the `ShellCommandBlueprint` spec. E.g. [builtin_blueprints/pipelines/process_customers.yaml](builtin_blueprints/pipelines/process_customers.yaml).
- The Python harness - a few lines of Python code that parse the YAML files to generate definitions to load into Dagster. These are located in the [builtin_blueprints/definitions.py](builtin_blueprints/definitions.py) file.

### Try it out

Make sure the blueprints library is installed, using the instructions [here](../../README.md#install).

Install the example:

```python
pip install -e .
```

Launch Dagster to see the definitions loaded from the blueprints:

```bash

dagster dev
```

Print out the JSON schema for the blueprints:

```bash
dagster-blueprints print-schema
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## Example: writing a custom blueprint type to simplify remote CSV -> warehouse ingestion

Imagine that analytics engineers in your organization often need to work with data hosted on the internet. You want to make it easy for them to develop syncs that bring CSV files from the internet into your data warehouse.

Ideally, all they need to write to set up a sync is this:

```yaml
type: curl_asset
table_name: customers
csv_url: https://somewebsite.com/customers.csv
```
Custom blueprint types can help with this. Using them involves two kinds of files:
- The YAML files themselves, which contain blobs that look like the above. E.g. [custom_blueprints/curl_assets/customers.yaml](custom_blueprints/curl_assets/customers.yaml).
- Python code that defines our custom blueprint type and uses it to load these YAML files into Dagster definitions. This is located in the [custom_blueprints/definitions.py](custom_blueprints/definitions.py) file.
### Try it out
Make sure the blueprints library is installed, using the instructions [here](../../README.md#install).
Install the example:
```python
pip install -e .
```

Launch Dagster to see the definitions loaded from the blueprints:

```bash

dagster dev
```

Print out the JSON schema for the blueprints:

```bash
dagster-blueprints print-schema
```

0 comments on commit 5945a36

Please sign in to comment.