Skip to content

Commit

Permalink
[docs] Update tutorials to remove Dagster code from __init__.py (do…
Browse files Browse the repository at this point in the history
…cs) (#23347)

## Summary & Motivation

This PR updates the tutorial docs to match the file structure updated in
#23346

## How I Tested These Changes

docs preview
  • Loading branch information
maximearmstrong authored and PedramNavid committed Aug 14, 2024
1 parent 851d50d commit 1d56287
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 19 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ To complete this tutorial, you'll need:

- **A template version of the tutorial project**, which you can use to follow along with the tutorial. This is the `tutorial_template` subfolder. In this folder, you'll also find:

- `assets`, a subfolder containing Dagster assets. We'll use `/assets/__init__.py` to write these.
- `assets`, a subfolder containing Dagster assets. We'll use `/assets.py` to write these.
- `notebooks`, a subfolder containing Jupyter notebooks. We'll use `/notebooks/iris-kmeans.ipynb` to write a Jupyter notebook.

---
Expand Down Expand Up @@ -119,17 +119,17 @@ Like many notebooks, this example does some fairly sophisticated work, including

By creating a Dagster asset from our notebook, we can integrate the notebook as part of our data platform. This enables us to make its contents more accessible to developers, stakeholders, and other assets in Dagster.

To create a Dagster asset from a Jupyter notebook, we can use the <PyObject module="dagstermill" object="define_dagstermill_asset" /> function. In `/tutorial_template/assets/__init__.py` add the following code snippet:
To create a Dagster asset from a Jupyter notebook, we can use the <PyObject module="dagstermill" object="define_dagstermill_asset" /> function. In `/tutorial_template/assets.py` add the following code snippet:

```python
# /tutorial_template/assets/__init__.py
# /tutorial_template/assets.py

from dagstermill import define_dagstermill_asset
from dagster import file_relative_path

iris_kmeans_jupyter_notebook = define_dagstermill_asset(
name="iris_kmeans_jupyter",
notebook_path=file_relative_path(__file__, "../notebooks/iris-kmeans.ipynb"),
notebook_path=file_relative_path(__file__, "notebooks/iris-kmeans.ipynb"),
group_name="template_tutorial",
)
```
Expand All @@ -152,18 +152,18 @@ We want to execute our Dagster asset and save the resulting notebook to a persis

Additionally, we need to provide a [resource](/concepts/resources) to the notebook to tell Dagster how to store the resulting `.ipynb` file. We'll use an [I/O manager](/concepts/io-management/io-managers) to accomplish this.

Open the `/tutorial_template/__init__.py` file and add the following code:
Open the `/tutorial_template/definitions.py` file and add the following code:

```python
# tutorial_template/__init__.py
# tutorial_template/definitions.py

from dagster import load_assets_from_package_module, Definitions
from dagster import load_assets_from_modules, Definitions
from dagstermill import ConfigurableLocalOutputNotebookIOManager

from . import assets

defs = Definitions(
assets=load_assets_from_package_module(assets),
assets=load_assets_from_modules([assets]),
resources={
"output_notebook_io_manager": ConfigurableLocalOutputNotebookIOManager()
}
Expand All @@ -173,7 +173,7 @@ defs = Definitions(

Let's take a look at what's happening here:

- Using <PyObject object="load_assets_from_package_module" />, we've imported all assets in the `assets` module. This approach allows any new assets we create to be automatically added to the `Definitions` object instead of needing to manually add them one by one.
- Using <PyObject object="load_assets_from_modules" />, we've imported all assets in the `assets` module. This approach allows any new assets we create to be automatically added to the `Definitions` object instead of needing to manually add them one by one.

- We provided a dictionary of resources to the `resources` parameter. In this example, that's the <PyObject module="dagstermill" object="ConfigurableLocalOutputNotebookIOManager" /> resource.

Expand Down Expand Up @@ -255,10 +255,10 @@ In this step, you'll:

### Step 5.1: Create the Iris dataset asset

To create an asset for the Iris dataset, add the following code to `/tutorial_template/assets/__init__.py`:
To create an asset for the Iris dataset, add the following code to `/tutorial_template/assets.py`:

```python
# /tutorial_template/assets/__init__.py
# /tutorial_template/assets.py

from dagstermill import define_dagstermill_asset
from dagster import asset, file_relative_path
Expand Down Expand Up @@ -290,10 +290,10 @@ Let's go over what's happening in this code block:

### Step 5.2: Provide the iris_dataset asset to the notebook asset

Next, we need to tell Dagster that the `iris_datset` asset is input data for the `iris-kmeans` notebook. To do this, add the `ins` parameter to the notebook asset:
Next, we need to tell Dagster that the `iris_dataset` asset is input data for the `iris-kmeans` notebook. To do this, add the `ins` parameter to the notebook asset:

```python
# tutorial_template/assets/__init__.py
# tutorial_template/assets.py
from dagstermill import define_dagstermill_asset
from dagster import asset, file_relative_path, AssetIn
import pandas as pd
Expand All @@ -302,7 +302,7 @@ import pandas as pd

iris_kmeans_jupyter_notebook = define_dagstermill_asset(
name="iris_kmeans_jupyter",
notebook_path=file_relative_path(__file__, "../notebooks/iris-kmeans.ipynb"),
notebook_path=file_relative_path(__file__, "notebooks/iris-kmeans.ipynb"),
group_name="template_tutorial",
ins={"iris": AssetIn("iris_dataset")}, # this is the new parameter!
)
Expand Down
6 changes: 3 additions & 3 deletions docs/content/tutorial/connecting-to-external-services.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ You should use resources to manage communicating with external services because

Suppose your Hacker News pipeline in Dagster has picked up momentum, and your stakeholders want to learn more about who is using it. You are tasked with analyzing how many people sign up for Hacker News.

In the scaffolded Dagster project you made in Part 2 of the tutorial, you may have noticed a directory called `resources` with an `__init__.py` file in it. It exposes a resource called `DataGeneratorResource`. This resource generates simulated data about Hacker News signups. You'll use this resource to get the data needed to produce an asset for analysis.
In the scaffolded Dagster project you made in Part 2 of the tutorial, you may have noticed a file called `resources.py`. It exposes a resource called `DataGeneratorResource`. This resource generates simulated data about Hacker News signups. You'll use this resource to get the data needed to produce an asset for analysis.

<Note>
The signup data from Hacker News is fake and generated by the library. This is
simulated data and should not be used for real use cases.
</Note>

In your `__init__.py`, import the class, create an instance of it, and add it to the `resources` argument for your code location's <PyObject object="Definitions" /> object under the key `hackernews_api`. The key used to define a resource in the <PyObject object="Definitions" /> object is the key you'll use to reference the resource later in your code. In this case, we’ll call it `hackernews_api`.
In your `definitions.py`, import the class, create an instance of it, and add it to the `resources` argument for your code location's <PyObject object="Definitions" /> object under the key `hackernews_api`. The key used to define a resource in the <PyObject object="Definitions" /> object is the key you'll use to reference the resource later in your code. In this case, we’ll call it `hackernews_api`.

Verify that your code looks similar to the code below:

Expand Down Expand Up @@ -163,7 +163,7 @@ HACKERNEWS_NUM_DAYS_WINDOW=30

Rename this file from `.env.example` to `.env`.

Afterward, you'll use Dagster's `EnvVar` class to access the environment variable. Update your `__init__.py` with the changes below:
Afterward, you'll use Dagster's `EnvVar` class to access the environment variable. Update your `definitions.py` with the changes below:

```python file=/tutorial/connecting/connecting_with_envvar.py startafter=start_add_config_to_resource endbefore=end_add_config_to_resource
from dagster import (
Expand Down
4 changes: 2 additions & 2 deletions docs/content/tutorial/scheduling-your-pipeline.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ By the end of this part of the tutorial, you'll be able to:

A _job_ lets you target a selection of assets to materialize them together as a single action. Assets can belong to multiple jobs.

Your Dagster repository has a file called `tutorial/__init__.py` that is used as a top-level definition for your project. Update the code in this file to add the job using the <PyObject object="define_asset_job"/> function:
Your Dagster repository has a file called `tutorial/definitions.py` that is used as a top-level definition for your project. Update the code in this file to add the job using the <PyObject object="define_asset_job"/> function:

```python file=/tutorial/scheduling/with_job/with_job.py
from dagster import (
Expand Down Expand Up @@ -70,7 +70,7 @@ Managing one type of definition, such as assets, is easy. However, it can quickl

After defining a job, it can be attached to a schedule. A schedule's responsibility is to start a run of the assigned job at a specified time. Schedules are added with the <PyObject object="ScheduleDefinition" /> class.

To regularly update the assets, add the new <PyObject object="ScheduleDefinition" /> import, create a new schedule for the `hackernews_job`, and add the schedule to the code location. The code below is how your `__init__.py` should look after making these changes:
To regularly update the assets, add the new <PyObject object="ScheduleDefinition" /> import, create a new schedule for the `hackernews_job`, and add the schedule to the code location. The code below is how your `definitions.py` should look after making these changes:

```python file=/tutorial/scheduling/with_schedule/with_schedule.py
from dagster import (
Expand Down

0 comments on commit 1d56287

Please sign in to comment.