Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs][guide] approaches to writing integrations #22903

Merged
merged 5 commits into from
Aug 23, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
title: "Approaches to writing integrations"
---

# Approaches to writing integrations

There are many approaches to writing integrations in Dagster. The choice of approach depends on the specific requirements of the integration, the level of control needed, and the complexity of the external system being integrated. By reviewing the pros and cons of each approach, it is possible to make an informed decision on the best method for a specific use case. The following are typical approaches that align with Dagster's best practices.

- Resource providers
- Factory methods
- Multi-Asset decorators
- Pipes protocol

## Resource providers

One of the most fundamental features that can be implemented in an integration is a resource object to interface with an external service. For example, the `dagster-snowflake` integration provides a custom [SnowflakeResource](https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-snowflake/dagster_snowflake/resources.py) that is a wrapper around the Snowflake `connector` object.

### Pros

- **Simple** Implementing a resource wrapper is often the first step in flushing out a fully-featured integration.
- **Reusable** Resources are a core building block in the Dagster ecosystem, and allow one to re-use code across assets.

### Cons

- **Low-level abstraction** While the resource can be re-used throughout the codebase, it does not provide any higher level abstraction to assets or jobs.

### Guide

<Note>A guide for writing a resource based integration is coming soon!</Note>

## Factory methods
cmpadden marked this conversation as resolved.
Show resolved Hide resolved

The factory pattern is used for creating multiple similar objects based on a set of specifications. This is often useful in the data engineering when you have similar processing that will operate on multiple objects with varying parameters.

For example, imagine you would like to perform an operation on a set of tables in a database. You could construct a factory method that takes in a table specification, resulting in a list of assets.

```python
from dagster import Definitions, asset

parameters = [
{"name": "asset1", "table": "users"},
{"name": "asset2", "table": "orders"},
]


def process_table(table_name: str) -> None:
pass


def build_asset(params):
@asset(name=params["name"])
def _asset():
process_table(params["table"])

return _asset


assets = [build_asset(params) for params in parameters]

defs = Definitions(assets=assets)
```

### Pros

- **Flexibility:** Allows for fine-grained control over the integration logic.
- **Modularity:** Easy to reuse components across different assets and jobs.
- **Explicit configuration:** Resources can be explicitly configured, making it clear what dependencies are required.

### Cons

- **Complexity:** Can be more complex to set up compared to other methods.
- **Boilerplate code:** May require more boilerplate code to define assets, resources, and jobs.

### Guide

<Note>
A guide for writing a factory method based integrations is coming soon!
</Note>

## Multi-asset decorators

cmpadden marked this conversation as resolved.
Show resolved Hide resolved
In the scenario where a single API call or configuration can result in multiple assets, with a shared runtime or dependencies, one may consider creating a multi-asset decorator. Example implementations of this approach include [dbt](https://github.com/dagster-io/dagster/tree/master/python_modules/libraries/dagster-dbt), [dlt](https://github.com/dagster-io/dagster/tree/master/python_modules/libraries/dagster-embedded-elt/dagster_embedded_elt/dlt), and [Sling](https://github.com/dagster-io/dagster/tree/master/python_modules/libraries/dagster-embedded-elt/dagster_embedded_elt/sling).

### Pros

- **Efficiency:** Allows defining multiple assets in a single function, reducing boilerplate code.
- **Simplicity:** Easier to manage related assets together.
- **Consistency:** Ensures that related assets are always defined and updated together.

### Cons

- **Less granular control:** May not provide as much fine-grained control as defining individual assets.
- **Complexity in debugging:** Debugging issues can be more challenging when multiple assets are defined in a single function.

### Guide

<Note>
A guide for writing a multi-asset decorator based integration is coming soon!
</Note>

## Pipes protocol

cmpadden marked this conversation as resolved.
Show resolved Hide resolved
The Pipes protocol is used to integrate with systems that have their own execution environments. It enables running code in these external environments while allowing Dagster to maintain control and visibility. Example implementations of this approach include [AWS Lambda](https://github.com/dagster-io/dagster/tree/d4b4d5beabf6475c7279b7f02f893a506bca0bb0/python_modules/libraries/dagster-aws/dagster_aws/pipes), [Databricks](https://github.com/dagster-io/dagster/blob/d4b4d5beabf6475c7279b7f02f893a506bca0bb0/python_modules/libraries/dagster-databricks/dagster_databricks/pipes.py), and [Kubernetes](https://github.com/dagster-io/dagster/blob/d4b4d5beabf6475c7279b7f02f893a506bca0bb0/python_modules/libraries/dagster-k8s/dagster_k8s/pipes.py).

### Pros

- **Separation of Environments:** Allows running code in external environments, which can be useful for integrating with systems that have their own execution environments.
- **Flexibility:** Can integrate with a wide range of external systems and languages.
- **Streaming logs and metadata:** Provides support for streaming logs and structured metadata back into Dagster.

### Cons

- **Complexity:** Can be complex to set up and configure.
- **Overhead:** May introduce additional overhead for managing external environments.

### Guide

- [Dagster Pipes details and customization](/concepts/dagster-pipes/dagster-pipes-details-and-customization)
Loading