Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-airbyte] Support client_id and client_secret in AirbyteCloudResource #23451

Conversation

maximearmstrong
Copy link
Contributor

@maximearmstrong maximearmstrong commented Aug 6, 2024

Summary & Motivation

This PR updates the AirbyteCloudResource to support client_id and client_secret for authentication. The api_key can no longer be used for authentication because Airbyte is deprecating portal.airbyte.com.

Tests and docs are updated to reflect the changes.

Two main questions for reviewers:

  • this PR implements a pattern where the access token is refreshed before making a call to the API, if the token was fetched more than 2.5 minutes ago. Should we avoid this pattern and let users manage the resource lifecycle?
    • According to Airbyte, the access token expires after 3 minutes.
    • The access token is initially fetched in setup_for_execution, then refreshed if needed. I'm concerned that for jobs including other assets, it might take more than 3 minutes before the Airbyte assets are materialized.
  • portal.airbyte.com will be deprecated next week, on August 15th, so I removed the previous api_key attribute without deprecation warning. Are we comfortable doing so? Considering that this approach will fail next week.

How I Tested These Changes

BK with updated tests

Fully tested on a live cloud instance with this code:

from dagster import Definitions, EnvVar
from dagster_airbyte import AirbyteCloudResource, build_airbyte_assets

airbyte_instance = AirbyteCloudResource(
    client_id=EnvVar("AIRBYTE_CLIENT_ID"),
    client_secret=EnvVar("AIRBYTE_CLIENT_SECRET"),
)

airbyte_assets = build_airbyte_assets(
    # Test connection - Sample Data (Faker) to Google Sheets 
    connection_id="0bb7a00c-0b85-4fac-b8ff-67dc380f1c29",
    destination_tables=["products", "purchases", "users"],
)

defs = Definitions(assets=airbyte_assets, resources={"airbyte": airbyte_instance})

Job successful:
Screenshot 2024-08-13 at 4 30 12 PM

Asset graph:
Screenshot 2024-08-13 at 4 30 38 PM

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @maximearmstrong and the rest of your teammates on Graphite Graphite

@maximearmstrong maximearmstrong changed the title [dagster-airbyte] Support client_id and client_secret in AirbyteCloudResource [wip][dagster-airbyte] Support client_id and client_secret in AirbyteCloudResource Aug 7, 2024
@maximearmstrong maximearmstrong self-assigned this Aug 7, 2024
@maximearmstrong maximearmstrong changed the title [wip][dagster-airbyte] Support client_id and client_secret in AirbyteCloudResource [wip][dagster-airbyte] Support client_id and client_secret in AirbyteCloudResource Aug 7, 2024
@maximearmstrong maximearmstrong force-pushed the maxime/ds-396/support-client-id-and-secret-for-airbyte-cloud branch from e11cfb3 to d9720b3 Compare August 8, 2024 18:10
@maximearmstrong maximearmstrong changed the title [wip][dagster-airbyte] Support client_id and client_secret in AirbyteCloudResource [dagster-airbyte] Support client_id and client_secret in AirbyteCloudResource Aug 8, 2024
@maximearmstrong maximearmstrong marked this pull request as ready for review August 8, 2024 20:05
@maximearmstrong maximearmstrong requested a review from yuhan August 8, 2024 20:05
@graphite-app graphite-app bot added the area: docs Related to documentation in general label Aug 8, 2024
@graphite-app graphite-app bot requested a review from erinkcochran87 August 8, 2024 20:05
@maximearmstrong maximearmstrong removed the request for review from erinkcochran87 August 8, 2024 20:17
@garethbrickman garethbrickman linked an issue Aug 8, 2024 that may be closed by this pull request
Copy link
Member

@benpankow benpankow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

portal.airbyte.com will be deprecated next week, on August 15th, so I removed the previous api_key attribute without deprecation warning. Are we comfortable doing so? Considering that this approach will fail next week.

Seems OK to me since it's going to break this week 👍

Have we tested this new API against a live Cloud instance? Would be good to sanity check beyond the tests, which look good to me.

not self._access_token_value
or not self._access_token_timestamp
or self._access_token_timestamp
<= datetime.timestamp(datetime.now() - timedelta(seconds=150))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider pulling out 150 seconds into a constant here and putting at top of file so it's easier to spot/edit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in f2fefcb


@property
def api_base_url(self) -> str:
return "https://api.airbyte.com/v1"

@property
def all_additional_request_params(self) -> Mapping[str, Any]:
return {"headers": {"Authorization": f"Bearer {self.api_key}", "User-Agent": "dagster"}}
# Make sure the access token is refreshed before using it when calling the API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a @property, we should avoid doing real work inside the fn body (like refreshing the access token). One different approach could be be to pull out the refresh into a separate fn like _get_or_refresh_access_token() so that it's clear that invoking it may have a cost, and call that directly in make_request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to make_request in f2fefcb

self._access_token_value = str(response["access_token"])
self._access_token_timestamp = datetime.now().timestamp()

def _needs_refreshed_access_token(self) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior makes sense to me, this is low enough complexity that it feels ok to abstract from the user

@maximearmstrong maximearmstrong force-pushed the maxime/ds-396/support-client-id-and-secret-for-airbyte-cloud branch from 865a662 to f2fefcb Compare August 13, 2024 19:20
@maximearmstrong
Copy link
Contributor Author

Have we tested this new API against a live Cloud instance? Would be good to sanity check beyond the tests, which look good to me.

@benpankow It was tested against a live Cloud Instance. I've updated the PR description to add the code and result.

@maximearmstrong maximearmstrong merged commit 1b05f7c into master Aug 13, 2024
2 checks passed
@maximearmstrong maximearmstrong deleted the maxime/ds-396/support-client-id-and-secret-for-airbyte-cloud branch August 13, 2024 20:41
clairelin135 pushed a commit that referenced this pull request Aug 13, 2024
…loudResource (#23451)

## Summary & Motivation

This PR updates the `AirbyteCloudResource` to support `client_id` and
`client_secret` for authentication. The `api_key` can no longer be used
for authentication because Airbyte is [deprecating
portal.airbyte.com](https://reference.airbyte.com/reference/portalairbytecom-deprecation).

Tests and docs are updated to reflect the changes.

Two main questions for reviewers:
- this PR implements a pattern where the access token is refreshed
before making a call to the API, if the token was fetched more than 2.5
minutes ago. Should we avoid this pattern and let users manage the
resource lifecycle?
- [According to
Airbyte](https://reference.airbyte.com/reference/portalairbytecom-deprecation),
the access token expires after 3 minutes.
- The access token is initially fetched in `setup_for_execution`, then
refreshed if needed. I'm concerned that for jobs including other assets,
it might take more than 3 minutes before the Airbyte assets are
materialized.
- [portal.airbyte.com will be deprecated next
week](https://reference.airbyte.com/reference/portalairbytecom-deprecation),
on August 15th, so I removed the previous `api_key` attribute without
deprecation warning. Are we comfortable doing so? Considering that this
approach will fail next week.

## How I Tested These Changes

BK with updated tests

Fully tested on a live cloud instance with this code: 

```python
from dagster import Definitions, EnvVar
from dagster_airbyte import AirbyteCloudResource, build_airbyte_assets

airbyte_instance = AirbyteCloudResource(
    client_id=EnvVar("AIRBYTE_CLIENT_ID"),
    client_secret=EnvVar("AIRBYTE_CLIENT_SECRET"),
)

airbyte_assets = build_airbyte_assets(
    # Test connection - Sample Data (Faker) to Google Sheets 
    connection_id="0bb7a00c-0b85-4fac-b8ff-67dc380f1c29",
    destination_tables=["products", "purchases", "users"],
)

defs = Definitions(assets=airbyte_assets, resources={"airbyte": airbyte_instance})
```

Job successful:
<img width="1037" alt="Screenshot 2024-08-13 at 4 30 12 PM"
src="https://github.com/user-attachments/assets/46c91c95-b597-48de-bcbd-8bbb38f5f6d4">

Asset graph:
<img width="1037" alt="Screenshot 2024-08-13 at 4 30 38 PM"
src="https://github.com/user-attachments/assets/eae8ab12-0da9-4956-8ae6-10932cd0c199">
PedramNavid pushed a commit that referenced this pull request Aug 14, 2024
…loudResource (#23451)

## Summary & Motivation

This PR updates the `AirbyteCloudResource` to support `client_id` and
`client_secret` for authentication. The `api_key` can no longer be used
for authentication because Airbyte is [deprecating
portal.airbyte.com](https://reference.airbyte.com/reference/portalairbytecom-deprecation).

Tests and docs are updated to reflect the changes.

Two main questions for reviewers:
- this PR implements a pattern where the access token is refreshed
before making a call to the API, if the token was fetched more than 2.5
minutes ago. Should we avoid this pattern and let users manage the
resource lifecycle?
- [According to
Airbyte](https://reference.airbyte.com/reference/portalairbytecom-deprecation),
the access token expires after 3 minutes.
- The access token is initially fetched in `setup_for_execution`, then
refreshed if needed. I'm concerned that for jobs including other assets,
it might take more than 3 minutes before the Airbyte assets are
materialized.
- [portal.airbyte.com will be deprecated next
week](https://reference.airbyte.com/reference/portalairbytecom-deprecation),
on August 15th, so I removed the previous `api_key` attribute without
deprecation warning. Are we comfortable doing so? Considering that this
approach will fail next week.

## How I Tested These Changes

BK with updated tests

Fully tested on a live cloud instance with this code: 

```python
from dagster import Definitions, EnvVar
from dagster_airbyte import AirbyteCloudResource, build_airbyte_assets

airbyte_instance = AirbyteCloudResource(
    client_id=EnvVar("AIRBYTE_CLIENT_ID"),
    client_secret=EnvVar("AIRBYTE_CLIENT_SECRET"),
)

airbyte_assets = build_airbyte_assets(
    # Test connection - Sample Data (Faker) to Google Sheets 
    connection_id="0bb7a00c-0b85-4fac-b8ff-67dc380f1c29",
    destination_tables=["products", "purchases", "users"],
)

defs = Definitions(assets=airbyte_assets, resources={"airbyte": airbyte_instance})
```

Job successful:
<img width="1037" alt="Screenshot 2024-08-13 at 4 30 12 PM"
src="https://github.com/user-attachments/assets/46c91c95-b597-48de-bcbd-8bbb38f5f6d4">

Asset graph:
<img width="1037" alt="Screenshot 2024-08-13 at 4 30 38 PM"
src="https://github.com/user-attachments/assets/eae8ab12-0da9-4956-8ae6-10932cd0c199">
maximearmstrong added a commit that referenced this pull request Dec 5, 2024
…lient (#26241)

## Summary & Motivation

This PR implements the base `_make_request` method for
AirbyteCloudClient and the methods that are required to handle the
creation of the access token.

About the access token, the logic was taken and reworked from this
[previous
implementation](#23451).
TL;DR, Airbyte now requires a client ID and secret to generate an access
token - this access token expires every 3 minutes. See more
[here](https://reference.airbyte.com/reference/portalairbytecom-deprecation).

## How I Tested These Changes

Additional tests
pskinnerthyme pushed a commit to pskinnerthyme/dagster that referenced this pull request Dec 16, 2024
…lient (dagster-io#26241)

## Summary & Motivation

This PR implements the base `_make_request` method for
AirbyteCloudClient and the methods that are required to handle the
creation of the access token.

About the access token, the logic was taken and reworked from this
[previous
implementation](dagster-io#23451).
TL;DR, Airbyte now requires a client ID and secret to generate an access
token - this access token expires every 3 minutes. See more
[here](https://reference.airbyte.com/reference/portalairbytecom-deprecation).

## How I Tested These Changes

Additional tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: docs Related to documentation in general
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Dagster Airbyte to use the new Airbyte API
2 participants