-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
## Summary & Motivation ## How I Tested These Changes
- Loading branch information
Showing
3 changed files
with
122 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,49 @@ | ||
# dagster-blueprints | ||
|
||
Experimental low-code package for building Dagster Definitions using YAML files. | ||
Dagster’s core definition Python APIs are infinitely flexible. This flexibility is vital for developers facing a wide set of data engineering problems. However, it’s also often dizzying and intimidating to people who need to contribute to data pipelines, but aren’t full time data engineers or experts in Dagster. For example, a data analyst who wants to pull in some data from a new data source, or a scientist who wants to wire up some preexisting shell scripts into a pipeline. | ||
|
||
"Blueprints" is a layer built on top of these APIs, that helps offer guard rails and simpler interfaces for contributing to data pipelines. Blueprints are intended to be used for common repetitive pipeline authoring tasks, such as: | ||
|
||
- Configuring a data sync of a common type | ||
- Putting a shell script on a schedule | ||
- Putting a Databricks notebook on a schedule | ||
|
||
A blueprint is a simple blob of data that describes how to construct one or more Dagster definitions. Because blueprints are simple blobs of data, they can be authored in YAML. In the future, they'll also be able to be authored in JSON, and perhaps even in Dagster's UI. | ||
|
||
![image](https://github.com/dagster-io/dagster/assets/654855/d65c9db3-cf1f-4a0f-a5aa-63be36e99076) | ||
|
||
Blueprints are intended to be heavily customized within an organization. While Dagster provides some blueprint types out of the box, the expectation is that data platform engineers will write Python code to curate and develop the set of blueprint types that their stakeholders have access to. | ||
|
||
![image](https://github.com/dagster-io/dagster/assets/654855/660983f4-a581-4094-8f66-c8a95e4299c3) | ||
|
||
## Blueprints vs. parsing YAML on your own | ||
|
||
Why use Blueprints when you can write your own code to parse YAML and generate Dagster definitions? | ||
|
||
- Schematized – Blueprints are typed using Pydantic classes. This enables the Dagster blueprints library to offer utilities that streamline YAML/JSON development: | ||
- Generate configuration for VS Code that offers typeahead and type-checking for YAML. | ||
- High quality errors when values don’t conform to types, linked to positions in the source YAML file. | ||
- Code references – Blueprints automatically attach metadata that link definitions to the YAML they were generated from. | ||
- Built-ins – take advantage of built-in blueprint types and use them seamlessly alongside your own custom types. | ||
|
||
## How to try out Blueprints | ||
|
||
### Install | ||
|
||
Clone the Dagster repo: | ||
|
||
```python | ||
git clone https://github.com/dagster-io/dagster.git | ||
``` | ||
|
||
Install the dagster-blueprints package, as well as the HEAD version of Dagster: | ||
|
||
```python | ||
pip install -e python_modules/dagster | ||
pip install -e examples/experimental/dagster-blueprints/ | ||
``` | ||
|
||
### Try out one of the examples: | ||
|
||
- [Built-in blueprints](examples/builtin-blueprints) | ||
- [Custom blueprints](examples/custom-blueprints) |
36 changes: 36 additions & 0 deletions
36
examples/experimental/dagster-blueprints/examples/builtin-blueprints/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
## Example: using the built-in `ShellCommmandBlueprint` to build pipelines of shell commands in YAML | ||
|
||
Imagine that people in your organization work heavily with shell scripts that process data, and you want them to write simple YAML files that compose these shell scripts into data pipelines that can be viewed and executed in Dagster. | ||
|
||
The built-in `ShellCommandBlueprint` can help with this. A `ShellCommandBlueprint` is a blueprint for one or more assets whose shared materialization function is a shell command. It accepts two fields: | ||
|
||
- `command`: The shell command, given as a string. | ||
- `assets`: Specs for the assets that are generated by the shell command. Each spec includes fields like `key` and `deps`. | ||
|
||
Using it involves two kinds of files. | ||
|
||
- The YAML files themselves, which contain blobs that conform to the `ShellCommandBlueprint` spec. E.g. [builtin_blueprints/pipelines/process_customers.yaml](builtin_blueprints/pipelines/process_customers.yaml). | ||
- The Python harness - a few lines of Python code that parse the YAML files to generate definitions to load into Dagster. These are located in the [builtin_blueprints/definitions.py](builtin_blueprints/definitions.py) file. | ||
|
||
### Try it out | ||
|
||
Make sure the blueprints library is installed, using the instructions [here](../../README.md#install). | ||
|
||
Install the example: | ||
|
||
```python | ||
pip install -e . | ||
``` | ||
|
||
Launch Dagster to see the definitions loaded from the blueprints: | ||
|
||
```bash | ||
|
||
dagster dev | ||
``` | ||
|
||
Print out the JSON schema for the blueprints: | ||
|
||
```bash | ||
dagster-blueprints print-schema | ||
``` |
39 changes: 39 additions & 0 deletions
39
examples/experimental/dagster-blueprints/examples/custom-blueprints/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
## Example: writing a custom blueprint type to simplify remote CSV -> warehouse ingestion | ||
|
||
Imagine that analytics engineers in your organization often need to work with data hosted on the internet. You want to make it easy for them to develop syncs that bring CSV files from the internet into your data warehouse. | ||
|
||
Ideally, all they need to write to set up a sync is this: | ||
|
||
```yaml | ||
type: curl_asset | ||
table_name: customers | ||
csv_url: https://somewebsite.com/customers.csv | ||
``` | ||
Custom blueprint types can help with this. Using them involves two kinds of files: | ||
- The YAML files themselves, which contain blobs that look like the above. E.g. [custom_blueprints/curl_assets/customers.yaml](custom_blueprints/curl_assets/customers.yaml). | ||
- Python code that defines our custom blueprint type and uses it to load these YAML files into Dagster definitions. This is located in the [custom_blueprints/definitions.py](custom_blueprints/definitions.py) file. | ||
### Try it out | ||
Make sure the blueprints library is installed, using the instructions [here](../../README.md#install). | ||
Install the example: | ||
```python | ||
pip install -e . | ||
``` | ||
|
||
Launch Dagster to see the definitions loaded from the blueprints: | ||
|
||
```bash | ||
|
||
dagster dev | ||
``` | ||
|
||
Print out the JSON schema for the blueprints: | ||
|
||
```bash | ||
dagster-blueprints print-schema | ||
``` |