-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dagster-airbyte] Support client_id
and client_secret
in AirbyteCloudResource
#23451
[dagster-airbyte] Support client_id
and client_secret
in AirbyteCloudResource
#23451
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @maximearmstrong and the rest of your teammates on Graphite |
client_id
and client_secret
in AirbyteCloudResource
e11cfb3
to
d9720b3
Compare
client_id
and client_secret
in AirbyteCloudResourceclient_id
and client_secret
in AirbyteCloudResource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
portal.airbyte.com will be deprecated next week, on August 15th, so I removed the previous api_key attribute without deprecation warning. Are we comfortable doing so? Considering that this approach will fail next week.
Seems OK to me since it's going to break this week 👍
Have we tested this new API against a live Cloud instance? Would be good to sanity check beyond the tests, which look good to me.
not self._access_token_value | ||
or not self._access_token_timestamp | ||
or self._access_token_timestamp | ||
<= datetime.timestamp(datetime.now() - timedelta(seconds=150)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider pulling out 150
seconds into a constant here and putting at top of file so it's easier to spot/edit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in f2fefcb
|
||
@property | ||
def api_base_url(self) -> str: | ||
return "https://api.airbyte.com/v1" | ||
|
||
@property | ||
def all_additional_request_params(self) -> Mapping[str, Any]: | ||
return {"headers": {"Authorization": f"Bearer {self.api_key}", "User-Agent": "dagster"}} | ||
# Make sure the access token is refreshed before using it when calling the API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a @property
, we should avoid doing real work inside the fn body (like refreshing the access token). One different approach could be be to pull out the refresh into a separate fn like _get_or_refresh_access_token()
so that it's clear that invoking it may have a cost, and call that directly in make_request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to make_request
in f2fefcb
self._access_token_value = str(response["access_token"]) | ||
self._access_token_timestamp = datetime.now().timestamp() | ||
|
||
def _needs_refreshed_access_token(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior makes sense to me, this is low enough complexity that it feels ok to abstract from the user
865a662
to
f2fefcb
Compare
@benpankow It was tested against a live Cloud Instance. I've updated the PR description to add the code and result. |
…loudResource (#23451) ## Summary & Motivation This PR updates the `AirbyteCloudResource` to support `client_id` and `client_secret` for authentication. The `api_key` can no longer be used for authentication because Airbyte is [deprecating portal.airbyte.com](https://reference.airbyte.com/reference/portalairbytecom-deprecation). Tests and docs are updated to reflect the changes. Two main questions for reviewers: - this PR implements a pattern where the access token is refreshed before making a call to the API, if the token was fetched more than 2.5 minutes ago. Should we avoid this pattern and let users manage the resource lifecycle? - [According to Airbyte](https://reference.airbyte.com/reference/portalairbytecom-deprecation), the access token expires after 3 minutes. - The access token is initially fetched in `setup_for_execution`, then refreshed if needed. I'm concerned that for jobs including other assets, it might take more than 3 minutes before the Airbyte assets are materialized. - [portal.airbyte.com will be deprecated next week](https://reference.airbyte.com/reference/portalairbytecom-deprecation), on August 15th, so I removed the previous `api_key` attribute without deprecation warning. Are we comfortable doing so? Considering that this approach will fail next week. ## How I Tested These Changes BK with updated tests Fully tested on a live cloud instance with this code: ```python from dagster import Definitions, EnvVar from dagster_airbyte import AirbyteCloudResource, build_airbyte_assets airbyte_instance = AirbyteCloudResource( client_id=EnvVar("AIRBYTE_CLIENT_ID"), client_secret=EnvVar("AIRBYTE_CLIENT_SECRET"), ) airbyte_assets = build_airbyte_assets( # Test connection - Sample Data (Faker) to Google Sheets connection_id="0bb7a00c-0b85-4fac-b8ff-67dc380f1c29", destination_tables=["products", "purchases", "users"], ) defs = Definitions(assets=airbyte_assets, resources={"airbyte": airbyte_instance}) ``` Job successful: <img width="1037" alt="Screenshot 2024-08-13 at 4 30 12 PM" src="https://github.com/user-attachments/assets/46c91c95-b597-48de-bcbd-8bbb38f5f6d4"> Asset graph: <img width="1037" alt="Screenshot 2024-08-13 at 4 30 38 PM" src="https://github.com/user-attachments/assets/eae8ab12-0da9-4956-8ae6-10932cd0c199">
…loudResource (#23451) ## Summary & Motivation This PR updates the `AirbyteCloudResource` to support `client_id` and `client_secret` for authentication. The `api_key` can no longer be used for authentication because Airbyte is [deprecating portal.airbyte.com](https://reference.airbyte.com/reference/portalairbytecom-deprecation). Tests and docs are updated to reflect the changes. Two main questions for reviewers: - this PR implements a pattern where the access token is refreshed before making a call to the API, if the token was fetched more than 2.5 minutes ago. Should we avoid this pattern and let users manage the resource lifecycle? - [According to Airbyte](https://reference.airbyte.com/reference/portalairbytecom-deprecation), the access token expires after 3 minutes. - The access token is initially fetched in `setup_for_execution`, then refreshed if needed. I'm concerned that for jobs including other assets, it might take more than 3 minutes before the Airbyte assets are materialized. - [portal.airbyte.com will be deprecated next week](https://reference.airbyte.com/reference/portalairbytecom-deprecation), on August 15th, so I removed the previous `api_key` attribute without deprecation warning. Are we comfortable doing so? Considering that this approach will fail next week. ## How I Tested These Changes BK with updated tests Fully tested on a live cloud instance with this code: ```python from dagster import Definitions, EnvVar from dagster_airbyte import AirbyteCloudResource, build_airbyte_assets airbyte_instance = AirbyteCloudResource( client_id=EnvVar("AIRBYTE_CLIENT_ID"), client_secret=EnvVar("AIRBYTE_CLIENT_SECRET"), ) airbyte_assets = build_airbyte_assets( # Test connection - Sample Data (Faker) to Google Sheets connection_id="0bb7a00c-0b85-4fac-b8ff-67dc380f1c29", destination_tables=["products", "purchases", "users"], ) defs = Definitions(assets=airbyte_assets, resources={"airbyte": airbyte_instance}) ``` Job successful: <img width="1037" alt="Screenshot 2024-08-13 at 4 30 12 PM" src="https://github.com/user-attachments/assets/46c91c95-b597-48de-bcbd-8bbb38f5f6d4"> Asset graph: <img width="1037" alt="Screenshot 2024-08-13 at 4 30 38 PM" src="https://github.com/user-attachments/assets/eae8ab12-0da9-4956-8ae6-10932cd0c199">
…lient (#26241) ## Summary & Motivation This PR implements the base `_make_request` method for AirbyteCloudClient and the methods that are required to handle the creation of the access token. About the access token, the logic was taken and reworked from this [previous implementation](#23451). TL;DR, Airbyte now requires a client ID and secret to generate an access token - this access token expires every 3 minutes. See more [here](https://reference.airbyte.com/reference/portalairbytecom-deprecation). ## How I Tested These Changes Additional tests
…lient (dagster-io#26241) ## Summary & Motivation This PR implements the base `_make_request` method for AirbyteCloudClient and the methods that are required to handle the creation of the access token. About the access token, the logic was taken and reworked from this [previous implementation](dagster-io#23451). TL;DR, Airbyte now requires a client ID and secret to generate an access token - this access token expires every 3 minutes. See more [here](https://reference.airbyte.com/reference/portalairbytecom-deprecation). ## How I Tested These Changes Additional tests
Summary & Motivation
This PR updates the
AirbyteCloudResource
to supportclient_id
andclient_secret
for authentication. Theapi_key
can no longer be used for authentication because Airbyte is deprecating portal.airbyte.com.Tests and docs are updated to reflect the changes.
Two main questions for reviewers:
setup_for_execution
, then refreshed if needed. I'm concerned that for jobs including other assets, it might take more than 3 minutes before the Airbyte assets are materialized.api_key
attribute without deprecation warning. Are we comfortable doing so? Considering that this approach will fail next week.How I Tested These Changes
BK with updated tests
Fully tested on a live cloud instance with this code:
Job successful:
Asset graph: