From a4fdec7053300604c82fec1480bec2d1f22af785 Mon Sep 17 00:00:00 2001 From: Dennis Hume Date: Fri, 13 Dec 2024 07:44:22 -0600 Subject: [PATCH] [du] consistency issues (#26416) ## Summary & Motivation Minor consistency issues from working through Dagster University ## How I Tested These Changes ## Changelog > Insert changelog entry or delete this section. --- docs/dagster-university/next-env.d.ts | 2 +- .../lesson-2/requirements-and-installation.md | 5 +++-- .../dagster-essentials/lesson-3/whats-an-asset.md | 4 ++-- .../lesson-4/coding-practice-taxi-zones-asset.md | 4 ++-- .../lesson-4/loading-data-into-a-database.md | 14 +++++++------- .../lesson-6/using-resources-in-assets.md | 8 ++++---- .../lesson-7/what-are-schedules.md | 2 +- .../lesson-8/adding-partitions-to-assets.md | 6 +++--- .../coding-practice-partition-taxi-trips.md | 2 +- ...ating-a-schedule-with-a-date-based-partition.md | 6 ++++-- 10 files changed, 28 insertions(+), 25 deletions(-) diff --git a/docs/dagster-university/next-env.d.ts b/docs/dagster-university/next-env.d.ts index 4f11a03dc6cc3..a4a7b3f5cfa2f 100644 --- a/docs/dagster-university/next-env.d.ts +++ b/docs/dagster-university/next-env.d.ts @@ -2,4 +2,4 @@ /// // NOTE: This file should not be edited -// see https://nextjs.org/docs/basic-features/typescript for more information. +// see https://nextjs.org/docs/pages/building-your-application/configuring/typescript for more information. diff --git a/docs/dagster-university/pages/dagster-essentials/lesson-2/requirements-and-installation.md b/docs/dagster-university/pages/dagster-essentials/lesson-2/requirements-and-installation.md index 363718ca770fe..b074d2757ae49 100644 --- a/docs/dagster-university/pages/dagster-essentials/lesson-2/requirements-and-installation.md +++ b/docs/dagster-university/pages/dagster-essentials/lesson-2/requirements-and-installation.md @@ -9,11 +9,12 @@ lesson: '2' To install Dagster, you’ll need: - **To install Python**. Dagster supports Python 3.9 through 3.12. -- **A package manager like pip or poetry**. If you need to install a package manager, refer to the following installation guides: +- **A package manager like pip, Poetry, or uv**. If you need to install a package manager, refer to the following installation guides: - [pip](https://pip.pypa.io/en/stable/installation/) - [Poetry](https://python-poetry.org/docs/) + - [uv](https://docs.astral.sh/uv/getting-started/installation/) -To check that Python and the pip or Poetry package manager are already installed in your environment, run: +To check that Python and the package manager are already installed in your environment, run: ```shell python --version diff --git a/docs/dagster-university/pages/dagster-essentials/lesson-3/whats-an-asset.md b/docs/dagster-university/pages/dagster-essentials/lesson-3/whats-an-asset.md index 7e3d27ebd2cee..99499c844f619 100644 --- a/docs/dagster-university/pages/dagster-essentials/lesson-3/whats-an-asset.md +++ b/docs/dagster-university/pages/dagster-essentials/lesson-3/whats-an-asset.md @@ -10,8 +10,8 @@ An asset is an object in persistent storage that captures some understanding of - **A database table or view**, such as those in a Google BigQuery data warehouse - **A file**, such as a file in your local machine or blob storage like Amazon S3 -- **A machine learning model** -- **An asset from an integration,** like a dbt model or a Fivetran connector +- **A machine learning model**, such as TensorFlow or PyTorch +- **An asset from an integration,** such as a dbt model or a Fivetran connector Assets aren’t limited to just the objects listed above - these are just some common examples. diff --git a/docs/dagster-university/pages/dagster-essentials/lesson-4/coding-practice-taxi-zones-asset.md b/docs/dagster-university/pages/dagster-essentials/lesson-4/coding-practice-taxi-zones-asset.md index 92eab5b00a8b5..5cb1bc9b33abc 100644 --- a/docs/dagster-university/pages/dagster-essentials/lesson-4/coding-practice-taxi-zones-asset.md +++ b/docs/dagster-university/pages/dagster-essentials/lesson-4/coding-practice-taxi-zones-asset.md @@ -30,7 +30,7 @@ The asset you built should look similar to the following code. Click **View answ deps=["taxi_zones_file"] ) def taxi_zones() -> None: - sql_query = f""" + query = f""" create or replace table zones as ( select LocationID as zone_id, @@ -42,5 +42,5 @@ def taxi_zones() -> None: """ conn = duckdb.connect(os.getenv("DUCKDB_DATABASE")) - conn.execute(sql_query) + conn.execute(query) ``` diff --git a/docs/dagster-university/pages/dagster-essentials/lesson-4/loading-data-into-a-database.md b/docs/dagster-university/pages/dagster-essentials/lesson-4/loading-data-into-a-database.md index 0bf9ced624484..560c46b30273c 100644 --- a/docs/dagster-university/pages/dagster-essentials/lesson-4/loading-data-into-a-database.md +++ b/docs/dagster-university/pages/dagster-essentials/lesson-4/loading-data-into-a-database.md @@ -25,7 +25,7 @@ Now that you have a query that produces an asset, let’s use Dagster to manage """ The raw taxi trips dataset, loaded into a DuckDB database """ - sql_query = """ + query = """ create or replace table trips as ( select VendorID as vendor_id, @@ -43,7 +43,7 @@ Now that you have a query that produces an asset, let’s use Dagster to manage """ conn = duckdb.connect(os.getenv("DUCKDB_DATABASE")) - conn.execute(sql_query) + conn.execute(query) ``` Let’s walk through what this code does: @@ -52,13 +52,13 @@ Now that you have a query that produces an asset, let’s use Dagster to manage 2. The `taxi_trips_file` asset is defined as a dependency of `taxi_trips` through the `deps` argument. - 3. Next, a variable named `sql_query` is created. This variable contains a SQL query that creates a table named `trips`, which sources its data from the `data/raw/taxi_trips_2023-03.parquet` file. This is the file created by the `taxi_trips_file` asset. + 3. Next, a variable named `query` is created. This variable contains a SQL query that creates a table named `trips`, which sources its data from the `data/raw/taxi_trips_2023-03.parquet` file. This is the file created by the `taxi_trips_file` asset. 4. A variable named `conn` is created, which defines the connection to the DuckDB database in the project. To do this, it uses the `.connect` method from the `duckdb` library, passing in the `DUCKDB_DATABASE` environment variable to tell DuckDB where the database is located. The `DUCKDB_DATABASE` environment variable, sourced from your project’s `.env` file, resolves to `data/staging/data.duckdb`. **Note**: We set up this file in Lesson 2 - refer to this lesson if you need a refresher. If this file isn’t set up correctly, the materialization will result in an error. - 5. Finally, `conn` is paired with the DuckDB `execute` method, where our SQL query (`sql_query`) is passed in as an argument. This tells the asset that, when materializing, to connect to the DuckDB database and execute the query in `sql_query`. + 5. Finally, `conn` is paired with the DuckDB `execute` method, where our SQL query (`query`) is passed in as an argument. This tells the asset that, when materializing, to connect to the DuckDB database and execute the query in `query`. 3. Save the changes to the file. @@ -98,9 +98,9 @@ This is because you’ve told Dagster that taxi_trips depends on the taxi_trips_ To confirm that the `taxi_trips` asset materialized properly, you can access the newly made `trips` table in DuckDB. In a new terminal session, open a Python REPL and run the following snippet: ```python -> import duckdb -> conn = duckdb.connect(database="data/staging/data.duckdb") # assumes you're writing to the same destination as specified in .env.example -> conn.execute("select count(*) from trips").fetchall() +import duckdb +conn = duckdb.connect(database="data/staging/data.duckdb") # assumes you're writing to the same destination as specified in .env.example +conn.execute("select count(*) from trips").fetchall() ``` The command should succeed and return a row count of the taxi trips that were ingested. When finished, make sure to stop the terminal process before continuing or you may encounter an error. Use `Control+C` or `Command+C` to stop the process. diff --git a/docs/dagster-university/pages/dagster-essentials/lesson-6/using-resources-in-assets.md b/docs/dagster-university/pages/dagster-essentials/lesson-6/using-resources-in-assets.md index 7e27a7e555ab1..3af063b4515b0 100644 --- a/docs/dagster-university/pages/dagster-essentials/lesson-6/using-resources-in-assets.md +++ b/docs/dagster-university/pages/dagster-essentials/lesson-6/using-resources-in-assets.md @@ -31,7 +31,7 @@ from dagster import asset deps=["taxi_trips_file"], ) def taxi_trips() -> None: - sql_query = """ + query = """ create or replace table taxi_trips as ( select VendorID as vendor_id, @@ -49,7 +49,7 @@ def taxi_trips() -> None: """ conn = duckdb.connect(os.getenv("DUCKDB_DATABASE")) - conn.execute(sql_query) + conn.execute(query) ``` --- @@ -72,7 +72,7 @@ from dagster import asset deps=["taxi_trips_file"], ) def taxi_trips(database: DuckDBResource) -> None: - sql_query = """ + query = """ create or replace table taxi_trips as ( select VendorID as vendor_id, @@ -90,7 +90,7 @@ def taxi_trips(database: DuckDBResource) -> None: """ with database.get_connection() as conn: - conn.execute(sql_query) + conn.execute(query) ``` To refactor `taxi_trips` to use the `database` resource, we had to: diff --git a/docs/dagster-university/pages/dagster-essentials/lesson-7/what-are-schedules.md b/docs/dagster-university/pages/dagster-essentials/lesson-7/what-are-schedules.md index 7abd4f66c4963..f91b4cd4900f0 100644 --- a/docs/dagster-university/pages/dagster-essentials/lesson-7/what-are-schedules.md +++ b/docs/dagster-university/pages/dagster-essentials/lesson-7/what-are-schedules.md @@ -92,7 +92,7 @@ Despite many schedulers and orchestrators replacing the cron program since then, Consider the following example: -```python +``` 15 5 * * 1-5 ``` diff --git a/docs/dagster-university/pages/dagster-essentials/lesson-8/adding-partitions-to-assets.md b/docs/dagster-university/pages/dagster-essentials/lesson-8/adding-partitions-to-assets.md index cdfdd00cf041e..c44c2b93eafc9 100644 --- a/docs/dagster-university/pages/dagster-essentials/lesson-8/adding-partitions-to-assets.md +++ b/docs/dagster-university/pages/dagster-essentials/lesson-8/adding-partitions-to-assets.md @@ -63,7 +63,7 @@ To add the partition to the asset: @asset( partitions_def=monthly_partition ) - def taxi_trips_file(context) -> None: + def taxi_trips_file(context: AssetExecutionContext) -> None: partition_date_str = context.partition_key ``` @@ -73,7 +73,7 @@ To add the partition to the asset: @asset( partitions_def=monthly_partition ) - def taxi_trips_file(context) -> None: + def taxi_trips_file(context: AssetExecutionContext) -> None: partition_date_str = context.partition_key month_to_fetch = partition_date_str[:-3] ``` @@ -86,7 +86,7 @@ from ..partitions import monthly_partition @asset( partitions_def=monthly_partition ) -def taxi_trips_file(context) -> None: +def taxi_trips_file(context: AssetExecutionContext) -> None: """ The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal. """ diff --git a/docs/dagster-university/pages/dagster-essentials/lesson-8/coding-practice-partition-taxi-trips.md b/docs/dagster-university/pages/dagster-essentials/lesson-8/coding-practice-partition-taxi-trips.md index f2aa0b5ffc093..606ee4b3847c0 100644 --- a/docs/dagster-university/pages/dagster-essentials/lesson-8/coding-practice-partition-taxi-trips.md +++ b/docs/dagster-university/pages/dagster-essentials/lesson-8/coding-practice-partition-taxi-trips.md @@ -17,7 +17,7 @@ To practice what you’ve learned, partition the `taxi_trips` asset by month usi {% callout %} You’ll need to drop the existing `taxi_trips` because of the new `partition_date` column. In a Python REPL or scratch script, run the following: - ```yaml + ``` import duckdb conn = duckdb.connect(database="data/staging/data.duckdb") conn.execute("drop table trips;") diff --git a/docs/dagster-university/pages/dagster-essentials/lesson-8/creating-a-schedule-with-a-date-based-partition.md b/docs/dagster-university/pages/dagster-essentials/lesson-8/creating-a-schedule-with-a-date-based-partition.md index a6327cb6ad954..cd8793b6755fe 100644 --- a/docs/dagster-university/pages/dagster-essentials/lesson-8/creating-a-schedule-with-a-date-based-partition.md +++ b/docs/dagster-university/pages/dagster-essentials/lesson-8/creating-a-schedule-with-a-date-based-partition.md @@ -36,12 +36,14 @@ To add partition to the job, make the following changes: The job should now look like this: ```python -from dagster import define_asset_job, AssetSelection, AssetKey +from dagster import define_asset_job, AssetSelection from ..partitions import monthly_partition +trips_by_week = AssetSelection.assets("trips_by_week") + trip_update_job = define_asset_job( name="trip_update_job", partitions_def=monthly_partition, # partitions added here - selection=AssetSelection.all() - AssetSelection.assets(["trips_by_week"]) + selection=AssetSelection.all() - trips_by_week ) ```