Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-airlift] create a custom operator for lakehouse example #23597

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

dpeng817
Copy link
Contributor

@dpeng817 dpeng817 commented Aug 12, 2024

As discussed offline, adds a custom operator for the lakehouse example. This should allow us to better demonstrate the step of "writing an operator to translate dagster code."

I think this leaves us with an unclear interstitial step for this type of asset. What's the "observe" phase look like here? Do we expect people to just provide an assetspec? We should sketch that out in a future PR.

@dpeng817 dpeng817 marked this pull request as ready for review August 12, 2024 22:58
@dpeng817 dpeng817 requested review from benpankow and schrockn August 12, 2024 22:59
@dpeng817 dpeng817 force-pushed the dpeng817/dbt-example-migration branch from 4ee3b85 to 60dfc57 Compare August 12, 2024 23:07
@dpeng817 dpeng817 force-pushed the dpeng817/new_operator_lakehouse branch from cec4498 to 6a7ceda Compare August 12, 2024 23:07
@dpeng817 dpeng817 force-pushed the dpeng817/dbt-example-migration branch from 60dfc57 to 531fac1 Compare August 12, 2024 23:17
@dpeng817 dpeng817 force-pushed the dpeng817/new_operator_lakehouse branch from 6a7ceda to 38f4c05 Compare August 12, 2024 23:17
@dpeng817 dpeng817 force-pushed the dpeng817/dbt-example-migration branch from 531fac1 to ee43fba Compare August 13, 2024 16:40
@dpeng817 dpeng817 force-pushed the dpeng817/new_operator_lakehouse branch from 38f4c05 to eb5ebf2 Compare August 13, 2024 16:40
@dpeng817 dpeng817 force-pushed the dpeng817/dbt-example-migration branch from ee43fba to f1266cb Compare August 13, 2024 18:01
@dpeng817 dpeng817 force-pushed the dpeng817/new_operator_lakehouse branch from eb5ebf2 to 7424d3b Compare August 13, 2024 18:01
Copy link
Member

@benpankow benpankow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a clarification q, but code seems good to me

@@ -33,10 +26,20 @@ def dbt_project_path() -> Path:
defs = build_defs_from_airflow_instance(
airflow_instance=airflow_instance,
orchestrated_defs=defs_from_factories(
PythonDefs(
CSVToDuckdbDefs(
Copy link
Member

@benpankow benpankow Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main purpose of the custom DefsFactory here is to produce an API/user code matching the custom Airflow operator, is that right? e.g. there's no technical reason that there needs to be a 1:1 mapping of operator to defs factory - just making sure I understand this right.

It seems like a user could just use a PythonDefs here with a fn/lambda to define the inputs to load_csv_to_duckdb, but the downside is that there's a higher mental load in migrating that way since the user code ends up looking less similar moving airflow->Dagster?

load_iris = PythonOperator(
  task_id="load_iris", 
  dag=dag,
  python_callable=lambda: load_csv_to_duckdb(
    table_name="iris_lakehouse_table",
    csv_path=Path("iris.csv"),
    duckdb_path=Path(os.environ["AIRFLOW_HOME"]) / "jaffle_shop.duckdb",
    column_names=[
        "sepal_length_cm",
        "sepal_width_cm",
        "petal_length_cm",
        "petal_width_cm",
        "species",
    ],
    duckdb_schema="iris_dataset",
    duckdb_database_name="jaffle_shop",
  )
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that PythonDefs could be used here, in the same way that a PythonOperator could be used to easily achieve the same goal of the load_csv_etc operator we use. The goal is to make it explicitly clear this semi 1:1 mapping exists for your custom operators / provide a clear path forward. The example is contrived, but the pattern isn't

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense 👍

@dpeng817 dpeng817 force-pushed the dpeng817/dbt-example-migration branch from f1266cb to cccaa7e Compare August 13, 2024 20:53
@dpeng817 dpeng817 force-pushed the dpeng817/new_operator_lakehouse branch from 7424d3b to 3af5db1 Compare August 13, 2024 20:53
@dpeng817 dpeng817 force-pushed the dpeng817/dbt-example-migration branch from cccaa7e to ad6f180 Compare August 13, 2024 21:14
@dpeng817 dpeng817 force-pushed the dpeng817/new_operator_lakehouse branch from 3af5db1 to 2593ed9 Compare August 13, 2024 21:14
@dpeng817 dpeng817 force-pushed the dpeng817/dbt-example-migration branch from ad6f180 to 3b70d75 Compare August 13, 2024 21:45
@dpeng817 dpeng817 force-pushed the dpeng817/new_operator_lakehouse branch from 2593ed9 to 4b66f55 Compare August 13, 2024 21:45
@dpeng817 dpeng817 force-pushed the dpeng817/dbt-example-migration branch from 3b70d75 to 006945f Compare August 13, 2024 22:11
@dpeng817 dpeng817 force-pushed the dpeng817/new_operator_lakehouse branch from 4b66f55 to 1b56c18 Compare August 13, 2024 22:11
Base automatically changed from dpeng817/dbt-example-migration to master August 13, 2024 22:11
@dpeng817 dpeng817 force-pushed the dpeng817/new_operator_lakehouse branch from 1b56c18 to 6a50a2e Compare August 13, 2024 22:13
@dpeng817 dpeng817 merged commit 170faf5 into master Aug 13, 2024
1 check was pending
@dpeng817 dpeng817 deleted the dpeng817/new_operator_lakehouse branch August 13, 2024 22:14
PedramNavid pushed a commit that referenced this pull request Aug 14, 2024
)

As discussed offline, adds a custom operator for the lakehouse example.
This should allow us to better demonstrate the step of "writing an
operator to translate dagster code."

I think this leaves us with an unclear interstitial step for this type
of asset. What's the "observe" phase look like here? Do we expect people
to just provide an assetspec? We should sketch that out in a future PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants