-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[du] consistency issues #26416
[du] consistency issues #26416
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,7 +30,7 @@ The asset you built should look similar to the following code. Click **View answ | |
deps=["taxi_zones_file"] | ||
) | ||
def taxi_zones() -> None: | ||
sql_query = f""" | ||
query = f""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Standardizing around |
||
create or replace table zones as ( | ||
select | ||
LocationID as zone_id, | ||
|
@@ -42,5 +42,5 @@ def taxi_zones() -> None: | |
""" | ||
|
||
conn = duckdb.connect(os.getenv("DUCKDB_DATABASE")) | ||
conn.execute(sql_query) | ||
conn.execute(query) | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,7 +25,7 @@ Now that you have a query that produces an asset, let’s use Dagster to manage | |
""" | ||
The raw taxi trips dataset, loaded into a DuckDB database | ||
""" | ||
sql_query = """ | ||
query = """ | ||
create or replace table trips as ( | ||
select | ||
VendorID as vendor_id, | ||
|
@@ -43,7 +43,7 @@ Now that you have a query that produces an asset, let’s use Dagster to manage | |
""" | ||
|
||
conn = duckdb.connect(os.getenv("DUCKDB_DATABASE")) | ||
conn.execute(sql_query) | ||
conn.execute(query) | ||
``` | ||
|
||
Let’s walk through what this code does: | ||
|
@@ -52,13 +52,13 @@ Now that you have a query that produces an asset, let’s use Dagster to manage | |
|
||
2. The `taxi_trips_file` asset is defined as a dependency of `taxi_trips` through the `deps` argument. | ||
|
||
3. Next, a variable named `sql_query` is created. This variable contains a SQL query that creates a table named `trips`, which sources its data from the `data/raw/taxi_trips_2023-03.parquet` file. This is the file created by the `taxi_trips_file` asset. | ||
3. Next, a variable named `query` is created. This variable contains a SQL query that creates a table named `trips`, which sources its data from the `data/raw/taxi_trips_2023-03.parquet` file. This is the file created by the `taxi_trips_file` asset. | ||
|
||
4. A variable named `conn` is created, which defines the connection to the DuckDB database in the project. To do this, it uses the `.connect` method from the `duckdb` library, passing in the `DUCKDB_DATABASE` environment variable to tell DuckDB where the database is located. | ||
|
||
The `DUCKDB_DATABASE` environment variable, sourced from your project’s `.env` file, resolves to `data/staging/data.duckdb`. **Note**: We set up this file in Lesson 2 - refer to this lesson if you need a refresher. If this file isn’t set up correctly, the materialization will result in an error. | ||
|
||
5. Finally, `conn` is paired with the DuckDB `execute` method, where our SQL query (`sql_query`) is passed in as an argument. This tells the asset that, when materializing, to connect to the DuckDB database and execute the query in `sql_query`. | ||
5. Finally, `conn` is paired with the DuckDB `execute` method, where our SQL query (`query`) is passed in as an argument. This tells the asset that, when materializing, to connect to the DuckDB database and execute the query in `query`. | ||
|
||
3. Save the changes to the file. | ||
|
||
|
@@ -98,9 +98,9 @@ This is because you’ve told Dagster that taxi_trips depends on the taxi_trips_ | |
To confirm that the `taxi_trips` asset materialized properly, you can access the newly made `trips` table in DuckDB. In a new terminal session, open a Python REPL and run the following snippet: | ||
|
||
```python | ||
> import duckdb | ||
> conn = duckdb.connect(database="data/staging/data.duckdb") # assumes you're writing to the same destination as specified in .env.example | ||
> conn.execute("select count(*) from trips").fetchall() | ||
import duckdb | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing |
||
conn = duckdb.connect(database="data/staging/data.duckdb") # assumes you're writing to the same destination as specified in .env.example | ||
conn.execute("select count(*) from trips").fetchall() | ||
``` | ||
|
||
The command should succeed and return a row count of the taxi trips that were ingested. When finished, make sure to stop the terminal process before continuing or you may encounter an error. Use `Control+C` or `Command+C` to stop the process. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -63,7 +63,7 @@ To add the partition to the asset: | |
@asset( | ||
partitions_def=monthly_partition | ||
) | ||
def taxi_trips_file(context) -> None: | ||
def taxi_trips_file(context: AssetExecutionContext) -> None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First example snippet uses type annotation. Including for consistency |
||
partition_date_str = context.partition_key | ||
``` | ||
|
||
|
@@ -73,7 +73,7 @@ To add the partition to the asset: | |
@asset( | ||
partitions_def=monthly_partition | ||
) | ||
def taxi_trips_file(context) -> None: | ||
def taxi_trips_file(context: AssetExecutionContext) -> None: | ||
partition_date_str = context.partition_key | ||
month_to_fetch = partition_date_str[:-3] | ||
``` | ||
|
@@ -86,7 +86,7 @@ from ..partitions import monthly_partition | |
@asset( | ||
partitions_def=monthly_partition | ||
) | ||
def taxi_trips_file(context) -> None: | ||
def taxi_trips_file(context: AssetExecutionContext) -> None: | ||
""" | ||
The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding in example since it was the only bullet point without one