Skip to content

Commit

Permalink
run prettier
Browse files Browse the repository at this point in the history
  • Loading branch information
PedramNavid committed Aug 16, 2024
1 parent 7371e06 commit ec535a4
Show file tree
Hide file tree
Showing 63 changed files with 163 additions and 188 deletions.
2 changes: 1 addition & 1 deletion docs/docs-next/docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
title: "Changelog"
---

# Changelog
# Changelog
2 changes: 1 addition & 1 deletion docs/docs-next/docs/concepts/assets/asset-checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Asset checks"
sidebar_position: 70
---

# Asset checks
# Asset checks
2 changes: 1 addition & 1 deletion docs/docs-next/docs/concepts/assets/asset-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Asset dependencies"
sidebar_position: 30
---

# Asset dependencies
# Asset dependencies
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Asset materialization"
sidebar_position: 20
---

# Asset materialization
# Asset materialization
2 changes: 1 addition & 1 deletion docs/docs-next/docs/concepts/assets/asset-metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Asset metadata"
sidebar_position: 40
---

# Asset metadata
# Asset metadata
2 changes: 1 addition & 1 deletion docs/docs-next/docs/concepts/assets/thinking-in-assets.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Thinking in assets"
sidebar_position: 10
---

# Thinking in assets
# Thinking in assets
2 changes: 1 addition & 1 deletion docs/docs-next/docs/concepts/execution.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# Execution
# Execution
2 changes: 1 addition & 1 deletion docs/docs-next/docs/concepts/ops-jobs.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# Ops and jobs
# Ops and jobs
2 changes: 1 addition & 1 deletion docs/docs-next/docs/concepts/partitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
title: "Partitions"
---

# Partitions
# Partitions
2 changes: 1 addition & 1 deletion docs/docs-next/docs/concepts/resources.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# Resources
# Resources
8 changes: 3 additions & 5 deletions docs/docs-next/docs/concepts/understanding-assets.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
---
title: Understanding Assets
description: Understanding the concept of assets in Dagster
last_update:
date: 2024-08-11
author: Pedram Navid
last_update:
date: 2024-08-11
author: Pedram Navid
---


2 changes: 1 addition & 1 deletion docs/docs-next/docs/dagster-plus.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Dagster+"
displayed_sidebar: "dagsterPlus"
---

# Dagster+
# Dagster+
2 changes: 1 addition & 1 deletion docs/docs-next/docs/guides.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
title: "Guides"
---

# Guides
# Guides
28 changes: 12 additions & 16 deletions docs/docs-next/docs/guides/automation.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
---
title: "Automating Pipelines"
description: Learn how to automate your data pipelines.
last_update:
date: 2024-08-12
author: Pedram Navid
last_update:
date: 2024-08-12
author: Pedram Navid
---

Automation is key to building reliable, efficient data pipelines.
This guide provides a simplified overview of the main ways to automate processes in Dagster,
helping you choose the right method for your needs. You will find links to more detailed guides for each method below.
Automation is key to building reliable, efficient data pipelines. This guide provides a simplified overview of the main ways to automate processes in Dagster, helping you choose the right method for your needs. You will find links to more detailed guides for each method below.

## What You'll Learn
## What you'll learn

- The different automation options available in Dagster
- How to implement basic scheduling and event-based triggers
- How to implement basic scheduling and event-based triggers
- Best practices for selecting and using automation methods

<details>
Expand All @@ -26,32 +24,30 @@ Before continuing, you should be familiar with:

</details>

## Automation Methods Overview
## Automation methods overview

Dagster offers several ways to automate pipeline execution:

1. [Schedules](#schedules) - Run jobs at specified times
2. [Sensors](#sensors) - Trigger runs based on events
3. [Asset Sensors](#asset-sensors) - Trigger jobs when specific assets materialize

Let's look at each method in more detail.

## Schedules
## Schedules

Schedules allow you to run jobs at specified times, like "every Monday at 9 AM" or "daily at midnight."
A schedule combines a selection of assets, known as a [Job](/concepts/ops-jobs), and a [cron expression](https://en.wikipedia.org/wiki/Cron)
in order to define when the job should be run.

To make creating cron expressions easier, you can use an online tool like [Crontab Guru](https://crontab.guru/).

### When to use Schedules
### When to use schedules

- You need to run jobs at regular intervals
- You want basic time-based automation

For examples of how to create schedules, see the [How-To Use Schedules](/guides/automation/schedules) guide.
For examples of how to create schedules, see [How-To Use Schedules](/guides/automation/schedules).

For more information about how Schedules work, see the [About Schedules](/concepts/schedules) concept page.
For more information about how Schedules work, see [About Schedules](/concepts/schedules).

## Sensors

Expand All @@ -72,7 +68,7 @@ For more examples of how to create sensors, see the [How-To Use Sensors](/guides

For more information about how Sensors work, see the [About Sensors](/concepts/sensors) concept page.

## Asset Sensors
## Asset sensors

Asset Sensors trigger jobs when specified assets are materialized, allowing you to create dependencies between jobs or code locations.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,3 @@ title: "Adding metadata to assets"
sidebar_position: 40
sidebar_label: "Adding metadata"
---

# Adding metadata to assets
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,3 @@ title: "Creating asset factories"
sidebar_position: 50
sidebar_label: "Creating asset factories"
---

# Creating asset factories
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,3 @@ title: "Creating data assets"
sidebar_position: 10
sidebar_label: "Creating data assets"
---

# Creating data assets
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,3 @@ title: "Creating dependencies between assets"
sidebar_position: 20
sidebar_label: "Creating asset dependencies"
---

# Creating dependencies between assets
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ title: How to Pass Data Between Assets
description: Learn how to pass data between assets in Dagster
sidebar_position: 30
sidebar_label: "Passing data between assets"
last_update:
date: 2024-08-11
author: Pedram Navid
last_update:
date: 2024-08-11
author: Pedram Navid
---

As you develop your data pipeline, you'll likely need to pass data between assets. By the end of this guide, you'll have a solid understanding of the different approaches to passing data between assets and when to use each one.
Expand All @@ -25,7 +25,7 @@ To follow the steps in this guide, you'll need:

## Overview

In Dagster, assets are the building blocks of your data pipeline and it's common to want to pass data between them. This guide will help you understand how to pass data between assets.
In Dagster, assets are the building blocks of your data pipeline and it's common to want to pass data between them. This guide will help you understand how to pass data between assets.

There are three ways of passing data between assets:

Expand All @@ -46,18 +46,21 @@ A common and recommended approach to passing data between assets is explicitly m
In this example, the first asset opens a connection to the SQLite database and writes data to it. The second asset opens a connection to the same database and reads data from it. The dependency between the first asset and the second asset is made explicit through the asset's `deps` argument.

The benefits of this approach are:

- It's explicit and easy to understand how data is stored and retrieved
- You have maximum flexibility in terms of how and where data is stored, for example, based on environment

The downsides of this approach are:

- You need to manage connections and transactions manually
- You need to handle errors and edge cases, for example, if the database is down or if a connection is closed

## Move Data Between Assets Implicitly Using IO Managers

Dagster's IO Managers are a powerful feature that manages data between assets by defining how data is read from and written to external storage. They help separate business logic from I/O operations, reducing boilerplate code and making it easier to change where data is stored.
Dagster's IO Managers are a powerful feature that manages data between assets by defining how data is read from and written to external storage. They help separate business logic from I/O operations, reducing boilerplate code and making it easier to change where data is stored.

I/O managers handle:

1. **Input**: Reading data from storage and loading it into memory for use by dependent assets.
2. **Output**: Writing data to the configured storage location.

Expand All @@ -74,14 +77,16 @@ each step would execute in a separate environment and would not have access to t

:::

The `people()` and `birds()` assets both write their dataframes to DuckDB
The `people()` and `birds()` assets both write their dataframes to DuckDB
for persistent storage. The `combined_data()` asset requests data from both assets by adding them as parameters to the function, and the IO Manager handles the reading them from DuckDB and making them available to the `combined_data` function as dataframes. Note that when you use IO Managers you do not need to manually add the asset's dependencies through the `deps` argument.

The benefits of this approach are:

- The reading and writing of data is handled by the IO Manager, reducing boilerplate code
- It's easy to swap out different IO Managers based on environments without changing the underlying asset computation

The downsides of this approach are:

- The IO Manager approach is less flexible should you need to customize how data is read or written to storage
- Some decisions may be made by the IO Manager for you, such as naming conventions that can be hard to override.

Expand All @@ -94,7 +99,7 @@ Consider this example:

<CodeExample filePath="guides/data-assets/passing-data-assets/passing-data-avoid.py" language="python" title="Avoid Passing Data Between Assets" />

This example downloads a zip file from Google Drive, unzips it, and loads the data into a pandas DataFrame. It relies on each asset running on the same file system to perform these operations.
This example downloads a zip file from Google Drive, unzips it, and loads the data into a pandas DataFrame. It relies on each asset running on the same file system to perform these operations.

The assets are modeled as tasks, rather than as data assets. For more information on the difference between tasks and data assets, check out the [Thinking in Assets](/concepts/assets/thinking-in-assets) guide.

Expand All @@ -107,18 +112,18 @@ instead within a single asset. This pipeline still assumes enough disk and
memory available to handle the data, but for smaller datasets, it can work well.

The benefits of this approach are:

- All the computation that defines how an asset is created is contained within a single asset, making it easier to understand and maintain
- It can be faster than relying on external storage, and doesn't require the overhead of setting up additional compute instances.


The downsides of this approach are:

- It makes certain assumptions about how much data is being processed
- It can be difficult to reuse functions across assets, since they're tightly coupled to the data they produce
- It may not always be possible to swap functionality based on the environment you are running in. For example, if you are running in a cloud environment, you may not have access to the local file system.


---

## Related Resources

TODO: add links to relevant API documentation here.
TODO: add links to relevant API documentation here.
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,3 @@ title: "Selecting subsets of assets"
sidebar_position: 60
sidebar_label: "Selecting assets"
---

# Selecting subsets of assets
2 changes: 1 addition & 1 deletion docs/docs-next/docs/guides/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
title: "Deployment"
---

# Deployment
# Deployment
2 changes: 1 addition & 1 deletion docs/docs-next/docs/guides/deployment/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Deploying to Amazon Web Services"
sidebar_position: 1
---

# Deploying to Amazon Web Services
# Deploying to Amazon Web Services
2 changes: 1 addition & 1 deletion docs/docs-next/docs/guides/deployment/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Deploying to Microsoft Azure"
sidebar_position: 3
---

# Deploying to Microsoft Azure
# Deploying to Microsoft Azure
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Building a data mesh"
sidebar_position: 6
---

# Building a data mesh
# Building a data mesh
2 changes: 1 addition & 1 deletion docs/docs-next/docs/guides/deployment/dagster-plus.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Deploying to Dagster+"
sidebar_position: 4
---

# Deploying to Dagster+
# Deploying to Dagster+
2 changes: 1 addition & 1 deletion docs/docs-next/docs/guides/deployment/gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Deploying to Google Cloud Platform"
sidebar_position: 2
---

# Deploying to Google Cloud Platform
# Deploying to Google Cloud Platform
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Managing code locations"
sidebar_position: 5
---

# Managing code locations
# Managing code locations
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Migrating from self-hosted to Dagster+"
sidebar_position: 7
---

# Migrating from self-hosted to Dagster+
# Migrating from self-hosted to Dagster+
2 changes: 1 addition & 1 deletion docs/docs-next/docs/guides/external-systems.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
title: "External systems"
---

# Data assets
# Data assets
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,3 @@
title: "Adding Python libraries"
sidebar_position: 3
---

# Adding Python libraries
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,3 @@
title: "Connecting databases"
sidebar_position: 1
---

# Connecting databases
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Using API connections"
sidebar_position: 2
---

# Using API connections
# Using API connections
2 changes: 1 addition & 1 deletion docs/docs-next/docs/guides/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
title: "Monitoring"
---

# Monitoring
# Monitoring
2 changes: 1 addition & 1 deletion docs/docs-next/docs/guides/monitoring/custom-logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Setting up custom logging"
sidebar_position: 1
---

# Setting up custom logging
# Setting up custom logging
2 changes: 1 addition & 1 deletion docs/docs-next/docs/guides/monitoring/custom-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ title: "Using custom metrics in logs"
sidebar_position: 3
---

# Using custom metrics in logs
# Using custom metrics in logs
Loading

0 comments on commit ec535a4

Please sign in to comment.