-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[13/n][dagster-airbyte] Implement airbyte_assets and build_airbyte_assets_definitions #26432
Conversation
4b9b2c1
to
6585306
Compare
e1ec6de
to
3a2fbe4
Compare
6585306
to
e8d86fc
Compare
3a2fbe4
to
b534c41
Compare
e8d86fc
to
15f5b2b
Compare
b534c41
to
71c99c0
Compare
71c99c0
to
3641abe
Compare
name="airbyte_connection_id", | ||
group_name="airbyte_connection_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these feel like bad filler values. pls update.
Also, is that op_name, and dagster asset group_name? If so, I don't like their inclusion since they should both be optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 827e42d
return default_spec.replace_attributes( | ||
key=asset_spec.key.with_prefix("my_prefix"), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we encourage people to use asset key prefix anymore. Can we change some other property, or use some sort of other asset key transformation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated for metadata in 827e42d
name="airbyte_connection_id", | ||
group_name="airbyte_connection_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing here, don't think these are 100% necessary right
python_modules/libraries/dagster-airbyte/dagster_airbyte/asset_decorator.py
Show resolved
Hide resolved
return multi_asset( | ||
name=name, | ||
group_name=group_name, | ||
can_subset=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we setting this to false? Seems like the subset case should be configurable at least right? If the user does things properly I don't see why we couldn't support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be set to true - I see that I changed it in the next PR instead of here. Will fix
python_modules/libraries/dagster-airbyte/dagster_airbyte/asset_defs.py
Outdated
Show resolved
Hide resolved
) | ||
|
||
def sync_and_poll( | ||
self, context: Optional[Union[OpExecutionContext, AssetExecutionContext]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this point shouldn't it always be AssetExecutionContext?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the implementation, you're right. In it's current state, sync_and_poll
couldn't be used in an op.
Is this something we want to support though? cc @yuhan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep OpExecutionContext
though - we should raise an error if users are trying to use it in an op. Will add in the implementation in the subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say we don’t support op and throw errors if someone tries to use it in ops. This keeps the surface area small, and we can always loosen it later (it’s better to start restrictive and loosen over time than the opposite).
@maximearmstrong I'm curious the reason why you think we should keep OpExecutionContext. I'm with Chris - I thought at this point, we'd expect the context to be always AssetExecutionContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuhan Yeah, in hindsight I also think removing OpExecutionContext
is best. My first thought was that all other integrations accept OpExecutionContext
in their materialization methods, but I think this will only be more confusing for users. Let's just emphasize in the docstring that this method can be used with assets only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users may not be feeding OpExecutionContext anyways.
We kept both in other ones likely for backcompat. But we introduced a breaking change in 1.9 here saying
If you have written helper functions with OpExecutionContext type annotations, they may need to be updated to include AssetExecutionContext depending on your usage.
So I think for net new changes, we can leave out op context.
Besides, that makes me wonder if we should start deprecating all OpExecutionContext from other integrations' materialization methods. This is defn out of scope here, but worth considering as a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good context, thanks!
Besides, that makes me wonder if we should start deprecating all OpExecutionContext from other integrations' materialization methods. This is defn out of scope here, but worth considering as a followup.
I will make the change for the reworked Fivetran integration, and open a ticket for the other ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 9636e66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna request changes here; some docstring improvements that I think are necessary, and I also think that we should support is_subsettable out of the box. At the very least, our pre-built load_asset_from_airbyte
should support arbitrary subsetting.
1b64ddb
to
860d858
Compare
3641abe
to
aca5ae6
Compare
860d858
to
c7fbd62
Compare
aca5ae6
to
3e9f644
Compare
c7fbd62
to
5caf85f
Compare
f888a06
to
90848ca
Compare
9636e66
to
c48ad8a
Compare
…pace.sync_and_poll (#26585) ## Summary & Motivation Following [this discussion](#26432 (comment)) for Airbyte Cloud - `FivetranWorkspace.sync_and_poll` should only accept context of type `AssetExecutionContext`. ## How I Tested These Changes Same tests with BK
client_secret=EnvVar("AIRBYTE_CLIENT_SECRET"), | ||
) | ||
|
||
# load asset specs a first time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird grammar here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in e13a65e
workspace.load_asset_specs() | ||
assert len(fetch_workspace_data_api_mocks.calls) == 4 | ||
|
||
# load asset specs a first time, no additional calls are made |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird grammar here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few minor comments looks good otherwise.
90848ca
to
94bb6cc
Compare
c48ad8a
to
7c0766a
Compare
94bb6cc
to
7009a22
Compare
7c0766a
to
51e95f1
Compare
7009a22
to
fa56484
Compare
51e95f1
to
25ad47c
Compare
25ad47c
to
e13a65e
Compare
Summary & Motivation
This PR implements the
airbyte_assets
decorator and thebuild_airbyte_assets_definitions
factory.airbyte_assets
can be used to create all assets for a given Airbyte connection, i.e. one asset per table in the connection.build_airbyte_assets_definitions
can be used to create all Airbyte assets defs, one per connection. It usesairbyte_assets
.These can be used with
AirbyteCloudWorkspace
, and the goal would be to use these with the workspace for Airbyte OSS as well.How I Tested These Changes
Additional unit tests with BK.
Changelog
[dagster-airbyte] The
airbyte_assets
decorator is added. It can be used with theAirbyteCloudWorkspace
resource andDagsterAirbyteTranslator
translator to load Airbyte tables for a given connection as assets in Dagster. Thebuild_airbyte_assets_definitions
factory can be used to create assets for all the connections in your Airbyte workspace.