Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5/n][dagster-airbyte] Implement fetch_airbyte_workspace_data #26253

Merged
merged 5 commits into from
Dec 5, 2024

Conversation

maximearmstrong
Copy link
Contributor

@maximearmstrong maximearmstrong commented Dec 3, 2024

Summary & Motivation

This PR implements AirbyteCloudWorkspace.fetch_airbyte_workspace_data, that fetches the connections and destinations included in a given workspace.

How I Tested These Changes

Additional unit tests with BK.

@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-4 branch from bfade38 to 1520501 Compare December 4, 2024 05:29
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-5 branch from 4be30a4 to 61380f0 Compare December 4, 2024 05:29
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-4 branch from 1520501 to f295f42 Compare December 4, 2024 15:30
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-5 branch from 61380f0 to 726f64a Compare December 4, 2024 15:30
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-4 branch from f295f42 to c3daf88 Compare December 4, 2024 19:16
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-5 branch 2 times, most recently from 3a8a668 to 7b1189b Compare December 4, 2024 19:20
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-4 branch from 7eb0fde to 224d89c Compare December 4, 2024 20:11
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-5 branch 2 times, most recently from f9cf32c to d231cae Compare December 4, 2024 20:20

@whitelist_for_serdes
@record
class AirbyteStream:
Copy link
Contributor Author

@maximearmstrong maximearmstrong Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A stream in Airbyte corresponds to a table. This could be also called AirbyteTable, but I kept Airbyte's ontology here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe note that in docstring but makes sense.

@maximearmstrong maximearmstrong self-assigned this Dec 4, 2024
@maximearmstrong maximearmstrong marked this pull request as ready for review December 4, 2024 20:37
Comment on lines +1043 to +1044
connection_id=partial_connection_details["connectionId"]
)
connection = AirbyteConnection.from_connection_details(
connection_details=full_connection_details
)
connections_by_id[connection.id] = connection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any casse where get_connection_details shouldn't just return AirbyteConnection? Similar for destinations. Just feels like it would simplify this loop quite a bit.

Copy link
Contributor Author

@maximearmstrong maximearmstrong Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We keep only what is needed for the translation of data to AssetSpec in AirbyteConnection and AirbyteDestination, but users could use the client and call get_connections with it something like:

from dagster_airbyte import AirbyteCloudWorkspace, load_airbyte_cloud_asset_specs

import dagster as dg

airbyte_cloud_workspace = AirbyteCloudWorkspace(
    workspace_id=dg.EnvVar("AIRBYTE_CLOUD_WORKSPACE_ID"),
    client_id=dg.EnvVar("AIRBYTE_CLOUD_CLIENT_ID"),
    client_secret=dg.EnvVar("AIRBYTE_CLOUD_CLIENT_SECRET"),
)

@airbyte_assets(
    connection_id="connection_id",
    workspace=airbyte_cloud_workspace,
)
def airbyte_connection_assets(context: dg.AssetExecutionContext, airbyte: AirbyteCloudWorkspace):
    client = airbyte.get_client()
    connections = client.get_connections()
    # do something with connections
    ...
    yield from airbyte.sync_and_poll(context=context)

In this case, they may want something in the raw API response.

Copy link
Contributor

@dpeng817 dpeng817 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential comment for code simplification but lgtm

@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-4 branch from 414214d to d19da64 Compare December 5, 2024 20:53
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-5 branch from d231cae to 4054ff8 Compare December 5, 2024 20:54
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-4 branch from d19da64 to 59f1db9 Compare December 5, 2024 21:18
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-5 branch from 4054ff8 to da101d9 Compare December 5, 2024 21:18
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-4 branch from 59f1db9 to 2ffd65b Compare December 5, 2024 22:13
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-5 branch from da101d9 to 874b666 Compare December 5, 2024 22:13
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-4 branch from 2ffd65b to 35bc6f4 Compare December 5, 2024 22:59
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-5 branch from 874b666 to 3ca1897 Compare December 5, 2024 22:59
Base automatically changed from maxime/rework-airbyte-cloud-4 to master December 5, 2024 23:12
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-5 branch from 3ca1897 to 977b2d4 Compare December 5, 2024 23:14
@maximearmstrong maximearmstrong merged commit 52f84fe into master Dec 5, 2024
1 check passed
@maximearmstrong maximearmstrong deleted the maxime/rework-airbyte-cloud-5 branch December 5, 2024 23:32
pskinnerthyme pushed a commit to pskinnerthyme/dagster that referenced this pull request Dec 16, 2024
…r-io#26253)

## Summary & Motivation

This PR implements `AirbyteCloudWorkspace.fetch_airbyte_workspace_data`,
that fetches the connections and destinations included in a given
workspace.

## How I Tested These Changes

Additional unit tests with BK.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants