Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[13/n][dagster-airbyte] Implement airbyte_assets and build_airbyte_assets_definitions #26432

Merged
merged 5 commits into from
Dec 26, 2024

Conversation

maximearmstrong
Copy link
Contributor

@maximearmstrong maximearmstrong commented Dec 12, 2024

Summary & Motivation

This PR implements the airbyte_assets decorator and the build_airbyte_assets_definitions factory.

  • airbyte_assets can be used to create all assets for a given Airbyte connection, i.e. one asset per table in the connection.
  • build_airbyte_assets_definitions can be used to create all Airbyte assets defs, one per connection. It uses airbyte_assets.

These can be used with AirbyteCloudWorkspace, and the goal would be to use these with the workspace for Airbyte OSS as well.

How I Tested These Changes

Additional unit tests with BK.

Changelog

[dagster-airbyte] The airbyte_assets decorator is added. It can be used with the AirbyteCloudWorkspace resource and DagsterAirbyteTranslator translator to load Airbyte tables for a given connection as assets in Dagster. The build_airbyte_assets_definitions factory can be used to create assets for all the connections in your Airbyte workspace.

@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-12 branch from 4b9b2c1 to 6585306 Compare December 13, 2024 00:04
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from e1ec6de to 3a2fbe4 Compare December 13, 2024 00:04
@maximearmstrong maximearmstrong changed the title [dagster-airbyte] Implement airbyte_assets and build_airbyte_assets_definitions [13/n][dagster-airbyte] Implement airbyte_assets and build_airbyte_assets_definitions Dec 16, 2024
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-12 branch from 6585306 to e8d86fc Compare December 17, 2024 23:13
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from 3a2fbe4 to b534c41 Compare December 17, 2024 23:13
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-12 branch from e8d86fc to 15f5b2b Compare December 18, 2024 00:48
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from b534c41 to 71c99c0 Compare December 18, 2024 00:48
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from 71c99c0 to 3641abe Compare December 18, 2024 01:07
@maximearmstrong maximearmstrong marked this pull request as ready for review December 18, 2024 01:22
@maximearmstrong maximearmstrong self-assigned this Dec 18, 2024
Comment on lines 48 to 49
name="airbyte_connection_id",
group_name="airbyte_connection_id",
Copy link
Contributor

@dpeng817 dpeng817 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these feel like bad filler values. pls update.
Also, is that op_name, and dagster asset group_name? If so, I don't like their inclusion since they should both be optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 827e42d

return default_spec.replace_attributes(
key=asset_spec.key.with_prefix("my_prefix"),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we encourage people to use asset key prefix anymore. Can we change some other property, or use some sort of other asset key transformation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated for metadata in 827e42d

Comment on lines 90 to 91
name="airbyte_connection_id",
group_name="airbyte_connection_id",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here, don't think these are 100% necessary right

return multi_asset(
name=name,
group_name=group_name,
can_subset=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we setting this to false? Seems like the subset case should be configurable at least right? If the user does things properly I don't see why we couldn't support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be set to true - I see that I changed it in the next PR instead of here. Will fix

)

def sync_and_poll(
self, context: Optional[Union[OpExecutionContext, AssetExecutionContext]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point shouldn't it always be AssetExecutionContext?

Copy link
Contributor Author

@maximearmstrong maximearmstrong Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementation, you're right. In it's current state, sync_and_poll couldn't be used in an op.

Is this something we want to support though? cc @yuhan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep OpExecutionContext though - we should raise an error if users are trying to use it in an op. Will add in the implementation in the subsequent PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say we don’t support op and throw errors if someone tries to use it in ops. This keeps the surface area small, and we can always loosen it later (it’s better to start restrictive and loosen over time than the opposite).

@maximearmstrong I'm curious the reason why you think we should keep OpExecutionContext. I'm with Chris - I thought at this point, we'd expect the context to be always AssetExecutionContext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuhan Yeah, in hindsight I also think removing OpExecutionContext is best. My first thought was that all other integrations accept OpExecutionContext in their materialization methods, but I think this will only be more confusing for users. Let's just emphasize in the docstring that this method can be used with assets only.

Copy link
Contributor

@yuhan yuhan Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users may not be feeding OpExecutionContext anyways.

We kept both in other ones likely for backcompat. But we introduced a breaking change in 1.9 here saying

If you have written helper functions with OpExecutionContext type annotations, they may need to be updated to include AssetExecutionContext depending on your usage.

So I think for net new changes, we can leave out op context.

Besides, that makes me wonder if we should start deprecating all OpExecutionContext from other integrations' materialization methods. This is defn out of scope here, but worth considering as a followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good context, thanks!

Besides, that makes me wonder if we should start deprecating all OpExecutionContext from other integrations' materialization methods. This is defn out of scope here, but worth considering as a followup.

I will make the change for the reworked Fivetran integration, and open a ticket for the other ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9636e66

Copy link
Contributor

@dpeng817 dpeng817 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna request changes here; some docstring improvements that I think are necessary, and I also think that we should support is_subsettable out of the box. At the very least, our pre-built load_asset_from_airbyte should support arbitrary subsetting.

@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-12 branch from 1b64ddb to 860d858 Compare December 18, 2024 22:42
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from 3641abe to aca5ae6 Compare December 18, 2024 22:42
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-12 branch from 860d858 to c7fbd62 Compare December 19, 2024 01:19
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from aca5ae6 to 3e9f644 Compare December 19, 2024 01:19
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-12 branch from c7fbd62 to 5caf85f Compare December 19, 2024 03:29
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-12 branch from f888a06 to 90848ca Compare December 19, 2024 15:50
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from 9636e66 to c48ad8a Compare December 19, 2024 15:50
maximearmstrong added a commit that referenced this pull request Dec 20, 2024
…pace.sync_and_poll (#26585)

## Summary & Motivation

Following [this
discussion](#26432 (comment))
for Airbyte Cloud - `FivetranWorkspace.sync_and_poll` should only accept
context of type `AssetExecutionContext`.

## How I Tested These Changes

Same tests with BK
client_secret=EnvVar("AIRBYTE_CLIENT_SECRET"),
)

# load asset specs a first time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird grammar here

Copy link
Contributor Author

@maximearmstrong maximearmstrong Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in e13a65e

workspace.load_asset_specs()
assert len(fetch_workspace_data_api_mocks.calls) == 4

# load asset specs a first time, no additional calls are made
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird grammar here as well

Copy link
Contributor

@dpeng817 dpeng817 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor comments looks good otherwise.

@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-12 branch from 90848ca to 94bb6cc Compare December 26, 2024 15:30
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from c48ad8a to 7c0766a Compare December 26, 2024 15:30
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-12 branch from 94bb6cc to 7009a22 Compare December 26, 2024 15:45
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from 7c0766a to 51e95f1 Compare December 26, 2024 15:45
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-12 branch from 7009a22 to fa56484 Compare December 26, 2024 16:08
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from 51e95f1 to 25ad47c Compare December 26, 2024 16:08
Base automatically changed from maxime/rework-airbyte-cloud-12 to master December 26, 2024 16:31
@maximearmstrong maximearmstrong force-pushed the maxime/rework-airbyte-cloud-13 branch from 25ad47c to e13a65e Compare December 26, 2024 16:43
@maximearmstrong maximearmstrong merged commit 594d686 into master Dec 26, 2024
1 check passed
@maximearmstrong maximearmstrong deleted the maxime/rework-airbyte-cloud-13 branch December 26, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants