[10/n][dagster-fivetran] Implement fivetran_assets and build_fivetran_assets_definitions #25944

maximearmstrong · 2024-11-15T02:12:43Z

Summary & Motivation

This PR implements the fivetran_assets decorator and the build_fivetran_assets_definitions factory.

fivetran_assets can be used to create all assets for a given Fivetran connector, i.e. one asset per table in the connector.
build_fivetran_assets_definitions can be used to create all Fivetran assets defs, one per connector. It uses fivetran_assets.

Both the asset decorator and factory use load_fivetran_asset_specs. This is motivated by the current implementation of dagster-dbt, dagster-dlt and dagster-sling - each leverages an asset decorator that loads the asset specs by itself.

To avoid calling the Fivetran API each time load_fivetran_asset_specs is called, it is cached using functools.lru_cache. load_fivetran_asset_specs uses the state-backed defs, so reloading the code won't make additional calls to the Fivetran API, but calling load_fivetran_asset_specs N times in a script will make N calls to the Fivetran API.

The goals here are:

make the Fivetran integration as similar as possible to the other ELT integrations by using the same patterns, eg. asset decorator
make the user experience as simple as possible and avoid having users manage the asset specs and number of calls to the Fivetran API.

How I Tested These Changes

Additional unit tests with BK.

Changelog

[dagster-fivetran] The fivetran_assets decorator is added. It can be used with the FivetranWorkspace resource and DagsterFivetranTranslator translator to load Fivetran tables for a given connector as assets in Dagster. The build_fivetran_assets_definitions factory can be used to create assets for all the connectors in your Fivetran workspace.

maximearmstrong · 2024-11-15T02:12:58Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

python_modules/libraries/dagster-fivetran/dagster_fivetran/asset_defs.py

python_modules/libraries/dagster-fivetran/dagster_fivetran/resources.py

maximearmstrong · 2024-11-15T15:47:21Z

python_modules/libraries/dagster-fivetran/dagster_fivetran/resources.py

+    def sync_and_poll(
+        self, context: Optional[Union[OpExecutionContext, AssetExecutionContext]] = None
+    ):
+        raise NotImplementedError()
+
+    def __eq__(self, other):
+        return (
+            isinstance(other, FivetranWorkspace)
+            and self.account_id == other.account_id
+            and self.api_key == other.api_key
+            and self.api_secret == other.api_secret
+        )
+
+    def __hash__(self):
+        return hash((self.account_id + self.api_key + self.api_secret))
+

+@lru_cache(maxsize=None)


Caching load_fivetran_asset_specs with functools.lru_cache requires FivetranWorkspace to be hashable.

An alternative to caching load_fivetran_asset_specs would be to call it in a FivetranWorkspace cached property.

class FivetranWorkspace(ConfigurableResource): ... @cache_property def asset_specs( self, dagster_fivetran_translator: Type[DagsterFivetranTranslator] = DagsterFivetranTranslator ) -> Sequence[AssetSpec]: return load_fivetran_asset_specs( workspace=self, dagster_fivetran_translator=dagster_fivetran_translator )

For the context, after this and this discussions, we decided to detach spec loading from the resources, which is why I cached load_fivetran_asset_specs instead of caching a property.

Tagging @benpankow @dpeng817 @schrockn @alangenfeld @OwenKephart for feedback on this specific comment and the PR description about caching the asset specs calls.

i bias towards @cached_method on an object instance instead of @lru_cache(maxsize=None) on a top level function - I think its much easier to reason about when things are expected to be cached

we went with cached_property for the FivetranWorkspace equivalent in Airlift - I think that's likely the right move here as well.

Updated to a cached method in FivetranWorkspace in 501caf7

python_modules/libraries/dagster-fivetran/dagster_fivetran/asset_decorator.py

dpeng817

I... don't think we should add these. Reasoning:

All other cases where we added new decorators were before we converged upon a new API design - and I think likely weren't considered holistically. The @dbt_assets decorator is an exception, but we didn't have AssetSpec yet. If we could go back and do it again, I have some doubts we would have added any of them (although there is some messy transformational stuff that it needs to do, so maybe I'm wrong)
We can always add it later. But if we add it now we're stuck supporting it and eventually adding arguments for all the asset spec things (like group_name here). Eventually you'll need to support some sort of splitting API, etc. I just think it's better to point users towards a more genericizable solution.

I'm also not sure I understand why we need both the decorator and build_fivetran_assets_definitions? Is the idea that it just provides progressively easier starting points?

If the actual function implementations were doing more complicated stuff, maybe I would see the additional value but these are pretty simple wrappers, thus I think we should hold off.

…d_fivetran_assets_definitions factory

…_assets_definitions (#25944) ## Summary & Motivation This PR implements the `fivetran_assets` decorator and the `build_fivetran_assets_definitions` factory. - `fivetran_assets` can be used to create all assets for a given Fivetran connector, i.e. one asset per table in the connector. - `build_fivetran_assets_definitions` can be used to create all Fivetran assets defs, one per connector. It uses `fivetran_assets`. Both the asset decorator and factory use `load_fivetran_asset_specs`. This is motivated by the current implementation of `dagster-dbt`, `dagster-dlt` and `dagster-sling` - each leverages an asset decorator that loads the asset specs by itself. To avoid calling the Fivetran API each time `load_fivetran_asset_specs` is called, it is cached using `functools.lru_cache`. `load_fivetran_asset_specs` uses the state-backed defs, so reloading the code won't make additional calls to the Fivetran API, but calling `load_fivetran_asset_specs` N times in a script will make N calls to the Fivetran API. The goals here are: - make the Fivetran integration as similar as possible to the other ELT integrations by using the same patterns, eg. asset decorator - make the user experience as simple as possible and avoid having users manage the asset specs and number of calls to the Fivetran API. ## How I Tested These Changes Additional unit tests with BK. ## Changelog [dagster-fivetran] The `fivetran_assets` decorator is added. It can be used with the `FivetranWorkspace` resource and `DagsterFivetranTranslator` translator to load Fivetran tables for a given connector as assets in Dagster. The `build_fivetran_assets_definitions` factory can be used to create assets for all the connectors in your Fivetran workspace.

…and state-backed defs (#26133) ## Summary & Motivation Updates load_fivetran_asset_specs() and state-backed definitions to accept an instance of `DagsterFivetranTranslator`. See more about the motivation in the original thread [here](#25944 (comment)). ## How I Tested These Changes Additional unit tests to test custom translators with BK ## Changelog [dagster-fivetran] `load_fivetran_asset_specs` is updated to accept an instance of `DagsterFivetranTranslator` or custom subclass.

…_assets_definitions (dagster-io#25944) ## Summary & Motivation This PR implements the `fivetran_assets` decorator and the `build_fivetran_assets_definitions` factory. - `fivetran_assets` can be used to create all assets for a given Fivetran connector, i.e. one asset per table in the connector. - `build_fivetran_assets_definitions` can be used to create all Fivetran assets defs, one per connector. It uses `fivetran_assets`. Both the asset decorator and factory use `load_fivetran_asset_specs`. This is motivated by the current implementation of `dagster-dbt`, `dagster-dlt` and `dagster-sling` - each leverages an asset decorator that loads the asset specs by itself. To avoid calling the Fivetran API each time `load_fivetran_asset_specs` is called, it is cached using `functools.lru_cache`. `load_fivetran_asset_specs` uses the state-backed defs, so reloading the code won't make additional calls to the Fivetran API, but calling `load_fivetran_asset_specs` N times in a script will make N calls to the Fivetran API. The goals here are: - make the Fivetran integration as similar as possible to the other ELT integrations by using the same patterns, eg. asset decorator - make the user experience as simple as possible and avoid having users manage the asset specs and number of calls to the Fivetran API. ## How I Tested These Changes Additional unit tests with BK. ## Changelog [dagster-fivetran] The `fivetran_assets` decorator is added. It can be used with the `FivetranWorkspace` resource and `DagsterFivetranTranslator` translator to load Fivetran tables for a given connector as assets in Dagster. The `build_fivetran_assets_definitions` factory can be used to create assets for all the connectors in your Fivetran workspace.

…and state-backed defs (dagster-io#26133) ## Summary & Motivation Updates load_fivetran_asset_specs() and state-backed definitions to accept an instance of `DagsterFivetranTranslator`. See more about the motivation in the original thread [here](dagster-io#25944 (comment)). ## How I Tested These Changes Additional unit tests to test custom translators with BK ## Changelog [dagster-fivetran] `load_fivetran_asset_specs` is updated to accept an instance of `DagsterFivetranTranslator` or custom subclass.

This was referenced Nov 15, 2024

[8/n][dagster-fivetran] Implement FivetranConnector and FivetranDestination #25889

Merged

[9/n][dagster-fivetran] Implement base sync methods in FivetranClient #25911

Merged

graphite-app bot reviewed Nov 15, 2024

View reviewed changes

python_modules/libraries/dagster-fivetran/dagster_fivetran/asset_defs.py Show resolved Hide resolved

maximearmstrong changed the title ~~[10/n][dagster-fivetran] Implement fivetran_assets decorator and build_fivetran_assets_definitions factory~~ [10/n][dagster-fivetran] Implement fivetran_assets and build_fivetran_assets_definitions Nov 15, 2024

maximearmstrong force-pushed the maxime/rework-fivetran-10 branch 3 times, most recently from 2500b07 to 3f0e9aa Compare November 15, 2024 14:44

graphite-app bot reviewed Nov 15, 2024

View reviewed changes

python_modules/libraries/dagster-fivetran/dagster_fivetran/resources.py Outdated Show resolved Hide resolved

maximearmstrong force-pushed the maxime/rework-fivetran-9 branch from bd27402 to 357ff3f Compare November 15, 2024 14:54

maximearmstrong force-pushed the maxime/rework-fivetran-10 branch from 6cbe4f4 to 0521c7f Compare November 15, 2024 14:54

maximearmstrong commented Nov 15, 2024

View reviewed changes

maximearmstrong marked this pull request as ready for review November 15, 2024 15:49

maximearmstrong self-assigned this Nov 15, 2024

maximearmstrong requested review from benpankow, dpeng817, schrockn, alangenfeld and OwenKephart November 15, 2024 15:49

maximearmstrong mentioned this pull request Nov 15, 2024

[11/n][dagster-fivetran] Implement materialization method in FivetranWorkspace #25961

Merged

dpeng817 reviewed Nov 18, 2024

View reviewed changes

python_modules/libraries/dagster-fivetran/dagster_fivetran/asset_decorator.py Outdated Show resolved Hide resolved

dpeng817 reviewed Nov 18, 2024

View reviewed changes

python_modules/libraries/dagster-fivetran/dagster_fivetran/asset_decorator.py Outdated Show resolved Hide resolved

dpeng817 previously requested changes Nov 18, 2024

View reviewed changes

maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from a773b5e to 74db3e2 Compare November 26, 2024 23:21

maximearmstrong force-pushed the maxime/rework-fivetran-10 branch from ebca057 to 4f7cd9a Compare November 26, 2024 23:21

maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from 74db3e2 to c18340e Compare November 27, 2024 13:32

maximearmstrong force-pushed the maxime/rework-fivetran-10 branch from 4f7cd9a to 5f1b3f8 Compare November 27, 2024 13:32

maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from c18340e to 3c9a56a Compare November 27, 2024 13:44

maximearmstrong force-pushed the maxime/rework-fivetran-10 branch from 5f1b3f8 to f78ad58 Compare November 27, 2024 13:44

maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from 3c9a56a to 48a3bb0 Compare November 27, 2024 14:05

maximearmstrong force-pushed the maxime/rework-fivetran-10 branch from f78ad58 to 44865a4 Compare November 27, 2024 14:05

maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from 48a3bb0 to cd041b0 Compare November 27, 2024 14:25

maximearmstrong force-pushed the maxime/rework-fivetran-10 branch from 44865a4 to a83dfc6 Compare November 27, 2024 14:26

Base automatically changed from maxime/implement-resync-and-poll-method-fivetran-client to master November 27, 2024 14:43

maximearmstrong force-pushed the maxime/rework-fivetran-10 branch 3 times, most recently from 748298f to 538348f Compare November 27, 2024 20:14

maximearmstrong added 7 commits November 27, 2024 16:18

[10/n][dagster-fivetran] Implement fivetran_assets decorator and buil…

2e9d9e7

…d_fivetran_assets_definitions factory

Hash tuple

0670482

Add cached method to load asset specs

74e043a

Update docstrings

18efee8

Update tests

1c342f2

Use translator instance in asset decorator, factory and cached method

94a353a

Add build_fivetran_assets_definitions in __init__

e97963d

maximearmstrong force-pushed the maxime/rework-fivetran-10 branch from 538348f to e97963d Compare November 27, 2024 21:18

maximearmstrong mentioned this pull request Nov 27, 2024

[dagster-fivetran] Implement get_columns_config_for_table in FivetranClient #26181

Merged

maximearmstrong merged commit 2e1fa7f into master Dec 5, 2024
1 check passed

maximearmstrong deleted the maxime/rework-fivetran-10 branch December 5, 2024 18:39

maximearmstrong mentioned this pull request Dec 20, 2024

[RFC][dagster-powerbi] Implement copy_with_context in DagsterPowerBITranslator #26617

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[10/n][dagster-fivetran] Implement fivetran_assets and build_fivetran_assets_definitions #25944

[10/n][dagster-fivetran] Implement fivetran_assets and build_fivetran_assets_definitions #25944

maximearmstrong commented Nov 15, 2024 •

edited

Loading

maximearmstrong commented Nov 15, 2024 •

edited

Loading

maximearmstrong Nov 15, 2024

maximearmstrong Nov 15, 2024

alangenfeld Nov 15, 2024

dpeng817 Nov 18, 2024

maximearmstrong Nov 22, 2024

dpeng817 left a comment

[10/n][dagster-fivetran] Implement fivetran_assets and build_fivetran_assets_definitions #25944

[10/n][dagster-fivetran] Implement fivetran_assets and build_fivetran_assets_definitions #25944

Conversation

maximearmstrong commented Nov 15, 2024 • edited Loading

Summary & Motivation

How I Tested These Changes

Changelog

maximearmstrong commented Nov 15, 2024 • edited Loading

maximearmstrong Nov 15, 2024

Choose a reason for hiding this comment

maximearmstrong Nov 15, 2024

Choose a reason for hiding this comment

alangenfeld Nov 15, 2024

Choose a reason for hiding this comment

dpeng817 Nov 18, 2024

Choose a reason for hiding this comment

maximearmstrong Nov 22, 2024

Choose a reason for hiding this comment

dpeng817 left a comment

Choose a reason for hiding this comment

maximearmstrong commented Nov 15, 2024 •

edited

Loading

maximearmstrong commented Nov 15, 2024 •

edited

Loading