Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a basic generic API client #313

Merged
merged 124 commits into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
7d65701
Implement a basic generic API client; convert some sources to use thi…
burnash Dec 29, 2023
f7af81b
Move the paginaton loop into APIClient
burnash Jan 23, 2024
4bfdc13
Factor out common code
burnash Jan 23, 2024
3ce56c1
Refactor common code
burnash Jan 24, 2024
ce784d4
add generic rest source and an example pipeline
burnash Jan 31, 2024
8d7d0bd
Restructure the REST client add pagination detector
burnash Feb 5, 2024
8bde92a
fix the paginator detector
burnash Feb 5, 2024
4b686a0
Accept paginator instance
burnash Feb 5, 2024
2e297e2
Add Offset paginator
burnash Feb 5, 2024
f5db3aa
Add comments
burnash Feb 5, 2024
650c3ca
Factor out resources
burnash Feb 5, 2024
e61e304
Add Literal
burnash Feb 5, 2024
5e63d33
Remove the example
burnash Feb 5, 2024
d55ae29
Add logging
burnash Feb 6, 2024
01fe721
Handle depended resources
burnash Feb 7, 2024
8138403
Fix the bug with duplication of nested sources
burnash Feb 8, 2024
ec689fe
Add an alternative version that uses classes
burnash Feb 8, 2024
afce8d3
Rearrange config
burnash Feb 8, 2024
3fe3af7
REST API: support all authentication methods (#354)
willi-mueller Feb 15, 2024
b2e7cec
Generic API client: include parent fields in child resource (#355)
willi-mueller Feb 16, 2024
0ea0edb
Resource based config
burnash Feb 14, 2024
16c5a7a
Receive a custom Session instance
burnash Feb 16, 2024
c6015fe
Include data from parent resource in child resource: ported to a new …
burnash Feb 19, 2024
d4a3160
Rest API: Ends pagination if next page path is not in response.json()…
willi-mueller Feb 19, 2024
9f956ba
Allow specification of SinglePagePaginator and refactors redundancy (…
willi-mueller Feb 20, 2024
7063337
Use the resource name as an endpoint path if path is missing
burnash Feb 20, 2024
1e1e676
[REST Source] renames default_paginator argument to paginator (#367)
willi-mueller Feb 22, 2024
884120b
Remove the legacy version
burnash Feb 21, 2024
d5eaee1
Add `records_key` to `SinglePagePaginator`
burnash Feb 22, 2024
2f580a3
[REST Source] completes renaming of default_paginator to paginator (#…
willi-mueller Feb 22, 2024
168b11a
Add tests and pagination
burnash Feb 25, 2024
e13cff2
Temporary disable paginator type check
burnash Feb 26, 2024
f14f539
[REST source] test case for dependent resource (#371)
willi-mueller Feb 26, 2024
10bd716
Remove comments
burnash Feb 26, 2024
0b31301
Reuse MOCK_BASE_URL for all endpoints
burnash Feb 27, 2024
65c6617
Rename the config container
burnash Feb 27, 2024
a554efa
Add tests for valid source configurations
burnash Feb 27, 2024
e6e6927
Add Flask-style paginaton
burnash Feb 27, 2024
06a054e
[REST API source] adds function to check connection (#357)
willi-mueller Feb 27, 2024
726b204
[REST Source] allow skipping http errors (#365)
willi-mueller Feb 27, 2024
0b99ba5
added the possibility to pass HTTPBasicAuth objects (#377)
francescomucio Feb 27, 2024
0631c98
Factor out typings
burnash Feb 27, 2024
d1d25f3
Add response_actions to enable skipping responses by status code or c…
burnash Feb 28, 2024
76710ea
Move records extractor out of the paginator class
burnash Feb 28, 2024
f3ea829
[REST] Detailed error handler logging (#383)
willi-mueller Feb 29, 2024
16cb89a
Fixes records detection for header links paginator
burnash Feb 29, 2024
987fdfa
[REST source] header_links can extract from responses without a recor…
willi-mueller Feb 29, 2024
71b4682
[REST source] fixes deprecation warning (#380)
willi-mueller Mar 1, 2024
eeea3a8
Use update_dict_nested in place of deep_merge
burnash Mar 1, 2024
4d01f35
Update the lockfile
burnash Mar 1, 2024
d732976
Add requirements.txt
burnash Mar 1, 2024
ac39f62
Upgrade dlt version
burnash Mar 2, 2024
14748a4
Rename records_path to data_selector
burnash Mar 4, 2024
39af289
Mutate request objects in paginators
burnash Mar 4, 2024
3d9c87e
Merge branch 'master' into enh/api_helper
burnash Mar 4, 2024
12b3726
Regenerate lock
burnash Mar 4, 2024
69c1300
Remove `request_client` param from RESTClient; set `raise_for_status`…
burnash Mar 5, 2024
79030f8
Pass all incremental params from config
burnash Mar 5, 2024
46e3385
Refactor to argument unpacking
burnash Mar 5, 2024
4b58b7b
Add more auth classes
burnash Mar 5, 2024
13e21b8
Factor out records extractor logic
burnash Mar 6, 2024
efd0d80
Add tests for detectors
burnash Mar 6, 2024
dbe1b65
Remove UnspecifiedPaginator
burnash Mar 6, 2024
fefd704
[REST CLIENT] alt response extractor (#396)
rudolfix Mar 6, 2024
c535088
makes openapi friendly auth (#397)
rudolfix Mar 6, 2024
5a1f3b5
Bring detect_paginator back
burnash Mar 6, 2024
05e899e
Fix test case for nested key (next.url); format code
burnash Mar 6, 2024
0c2ddcf
Revert Notion source
burnash Mar 6, 2024
45be0cf
Revert Personio and Zendesk
burnash Mar 6, 2024
9ba18a7
Remove an unused file
burnash Mar 6, 2024
8734678
Restore personio settings
burnash Mar 6, 2024
739547c
Restore personio tests and
burnash Mar 6, 2024
4148a69
Add type annotations
burnash Mar 6, 2024
b15f2dd
bumps dlt to 0.4.6
rudolfix Mar 7, 2024
01a08cb
Type fixes and dlt session check
burnash Mar 7, 2024
e2aac86
[REST CLIENT] yields data pages with requests context (#399)
rudolfix Mar 8, 2024
3a67ac8
Fix paginator_config unpacking and RESTClient typing errors
burnash Mar 9, 2024
639a649
Fix more typing errors
burnash Mar 9, 2024
f29655e
Fix E741
burnash Mar 10, 2024
bd19ee2
Fix linting errors
burnash Mar 10, 2024
bdd6feb
Update the lock file
burnash Mar 10, 2024
6ce75e0
Extract build_resource_dependency_graph()
burnash Mar 10, 2024
4bc5340
Factor out create_resources()
burnash Mar 10, 2024
d0a22d9
Use requests hooks to handle response actions
burnash Mar 10, 2024
86eb1ed
Derive the response exception from DltException
burnash Mar 10, 2024
ed81495
Resolve poetry.lock conflict
burnash Mar 10, 2024
3ae0ba5
Fix black check
burnash Mar 10, 2024
2d336e9
Fix lint
burnash Mar 10, 2024
4a90485
Make default token expiration configurable
burnash Mar 10, 2024
55a5924
Add missing http headers
burnash Mar 10, 2024
dad1e96
Refactor paginator creation in RESTClient to use PaginatorFactory
burnash Mar 10, 2024
e39aaf9
Use frozensets
burnash Mar 10, 2024
b2715c9
Remove an unused import
burnash Mar 10, 2024
23b22a5
Update docstrings
burnash Mar 10, 2024
e8cfa32
Accept additional dlt source arguments in `rest_api_source()`
burnash Mar 10, 2024
24ec947
Add a workaround to pass test_dlt_init
burnash Mar 10, 2024
94beb5b
Extend config test with an auth class instance case
burnash Mar 12, 2024
ba46fda
Remove Any from PaginatorType
burnash Mar 12, 2024
82f3357
Upgrade dlt
burnash Mar 12, 2024
d278643
Update lock file
burnash Mar 12, 2024
3141ecc
Remove commented code
burnash Mar 12, 2024
94352ed
Refactor configuration setup into a dedicated module
burnash Mar 13, 2024
70496ec
Move response hooks setup and handling out of RESTClient
burnash Mar 13, 2024
2e3d50a
Remove unused imports
burnash Mar 13, 2024
b6d3794
Fix hooks typing
burnash Mar 14, 2024
3b7a0b6
Rename args of the OffsetPaginator
burnash Mar 20, 2024
b971f9b
Create a RESTClient per resource
burnash Mar 20, 2024
ebff65c
Handle both error statuses and response actions
burnash Mar 20, 2024
4fb4ce1
Initial version of the README.md (#389)
francescomucio Mar 21, 2024
59bba2e
Remove commented code
burnash Mar 20, 2024
390c233
Clean up docstrings
burnash Mar 20, 2024
5ecc426
Remove the useless conditional init for items
burnash Mar 20, 2024
6924233
Fix grammar in the README
burnash Mar 21, 2024
ff5ddd8
Remove UnspecifiedPaginator
burnash Mar 21, 2024
c24a5aa
Format README
burnash Mar 21, 2024
d63e30f
Add handling end_value and end_param
burnash Mar 21, 2024
5cbf863
Use NamedTuple for incremental params
burnash Mar 22, 2024
6b7a891
Move check_connection to utils
burnash Mar 22, 2024
6dab451
Instantiate auth based on type
burnash Mar 22, 2024
32e8327
Merge branch 'master' into enh/api_helper
burnash Mar 22, 2024
559f3a2
Update lock file
burnash Mar 22, 2024
cf8e266
Sthor/api helper updates (#400)
steinitzu Mar 22, 2024
b8e6b5d
Remove unused imports
burnash Mar 22, 2024
e6d927c
Use jsonpath for next_path; remove create_nested_accessor
burnash Mar 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sources/api_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from rest_api.client import RESTClient
20 changes: 16 additions & 4 deletions sources/notion/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@

import dlt
from dlt.sources import DltResource
from rest_api import RESTClient, BearerTokenAuth
from .settings import API_URL, DEFAULT_HEADERS
from .helpers.paginator import NotionPaginator

from .helpers.client import NotionClient
from .helpers.database import NotionDatabase
Expand All @@ -27,15 +30,24 @@ def notion_databases(
Yields:
DltResource: Data resources from Notion databases.
"""
notion_client = NotionClient(api_key)
notion_client = RESTClient(
base_url=API_URL,
headers=DEFAULT_HEADERS,
auth=BearerTokenAuth(api_key),
paginator=NotionPaginator(),
)

if database_ids is None:
search_results = notion_client.search(
filter_criteria={"value": "database", "property": "object"}
search_results = notion_client.paginate(
burnash marked this conversation as resolved.
Show resolved Hide resolved
"/search",
json={"filter": {"value": "database", "property": "object"}},
method="post",
)

database_ids = [
{"id": result["id"], "use_name": result["title"][0]["plain_text"]}
for result in search_results
for page in search_results
for result in page
]

for database in database_ids:
Expand Down
38 changes: 17 additions & 21 deletions sources/notion/helpers/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from dlt.common.typing import TDataItem

from .client import NotionClient
from api_client import RESTClient


class NotionDatabase:
Expand All @@ -14,7 +14,7 @@ class NotionDatabase:
notion_client (NotionClient): A client to interact with the Notion API.
"""

def __init__(self, database_id: str, notion_client: NotionClient):
def __init__(self, database_id: str, notion_client: RESTClient):
self.database_id = database_id
self.notion_client = notion_client

Expand All @@ -27,7 +27,7 @@ def get_structure(self) -> Any:
Returns:
Any: The structure of the database.
"""
return self.notion_client.fetch_resource("databases", self.database_id)
return self.notion_client.get(f"databases/{self.database_id}")

def query(
self,
Expand Down Expand Up @@ -57,22 +57,18 @@ def query(
Yields:
List[Dict[str, Any]]: A record from the database.
"""
while True:
payload = {
"filter": filter_criteria,
"sorts": sorts,
"start_cursor": start_cursor,
"page_size": page_size,
}
response = self.notion_client.send_payload(
"databases",
self.database_id,
subresource="query",
query_params=filter_properties,
payload=payload,
)
payload = {
"filter": filter_criteria,
"sorts": sorts,
"start_cursor": start_cursor,
"page_size": page_size,
}

yield response.get("results", [])
if not response.get("has_more"):
break
start_cursor = response.get("next_cursor")
filtered_payload = {k: v for k, v in payload.items() if v is not None}

return self.notion_client.paginate(
f"databases/{self.database_id}/query",
params=filter_properties,
json=filtered_payload,
method="post",
)
14 changes: 14 additions & 0 deletions sources/notion/helpers/paginator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from dlt.sources.helpers.requests import Response
from rest_api import JSONResponsePaginator

class NotionPaginator(JSONResponsePaginator):
def __init__(self, cursor_key='next_cursor', records_key='results'):
super().__init__(next_key=cursor_key, records_key=records_key)

def prepare_next_request_args(self, url, params, json):
json = json or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if someone will debug and import json module I think this might be inconvenient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The reason I picked this name for the argument is to make the function interface a bit more consistent with Requests.


if self.next_reference:
json["start_cursor"] = self.next_reference

return url, params, json
4 changes: 4 additions & 0 deletions sources/notion/settings.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
"""Notion source settings and constants"""

API_URL = "https://api.notion.com/v1"
DEFAULT_HEADERS = {
"accept": "application/json",
"Notion-Version": "2022-06-28"
}
6 changes: 4 additions & 2 deletions sources/personio/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from dlt.sources import DltResource

from .helpers import PersonioAPI
from .settings import DEFAULT_ITEMS_PER_PAGE, FIRST_DAY_OF_MILLENNIUM
from .settings import BASE_URL, DEFAULT_ITEMS_PER_PAGE, FIRST_DAY_OF_MILLENNIUM


@dlt.source(name="personio")
Expand All @@ -29,7 +29,9 @@ def personio_source(
Iterable: A list of DltResource objects representing the data resources.
"""

client = PersonioAPI(client_id, client_secret)
client = PersonioAPI(
base_url=BASE_URL, client_id=client_id,client_secret=client_secret
)

@dlt.resource(primary_key="id", write_disposition="merge")
def employees(
Expand Down
65 changes: 19 additions & 46 deletions sources/personio/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,33 @@
from dlt.common.typing import Dict, TDataItems
from dlt.sources.helpers import requests

from ..api_client import RESTClient, BearerTokenAuth
from .paginator import Paginator

class PersonioAPI:
"""A Personio API client."""

base_url = "https://api.personio.de/v1/"
class PersonioAPI(RESTClient):
"""A Personio API client."""

def __init__(self, client_id: str, client_secret: str) -> None:
def __init__(self, base_url: str, client_id: str, client_secret: str) -> None:
"""
Args:
client_id: The client ID of your app.
client_secret: The client secret of your app.
"""
self.client_id = client_id
self.client_secret = client_secret
self.access_token = self.get_token()
self.access_token = self.get_token(base_url, client_id, client_secret)
super().__init__(base_url, auth=BearerTokenAuth(self.access_token))

def get_token(self) -> str:
def get_token(self, base_url: str, client_id: str, client_secret: str) -> str:
"""Get an access token from Personio.

Returns:
The access token.
"""
headers = {"Content-Type": "application/json", "Accept": "application/json"}
data = {"client_id": self.client_id, "client_secret": self.client_secret}
url = urljoin(self.base_url, "auth")
response = requests.request("POST", url, headers=headers, json=data)
json_response = response.json()
token: str = json_response["data"]["token"]
return token
url = urljoin(base_url, "auth")
response = requests.post(
url, json={"client_id": client_id, "client_secret": client_secret}
)
return response.json()["data"]["token"]

def get_pages(
self,
Expand All @@ -52,34 +50,9 @@ def get_pages(
List of data items from the page
"""
params = params or {}
headers = {"Authorization": f"Bearer {self.access_token}"}
params.update({"offset": int(offset_by_page), "page": int(offset_by_page)})
url = urljoin(self.base_url, resource)
starts_from_zero = False
while True:
response = requests.get(url, headers=headers, params=params)
json_response = response.json()
# Get an item list from the page
yield json_response["data"]

metadata = json_response.get("metadata")
if not metadata:
break

total_pages = metadata.get("total_pages")
current_page = metadata.get("current_page")
if current_page == 0:
starts_from_zero = True

if (
current_page >= (total_pages - int(starts_from_zero))
or not json_response["data"]
):
break

if offset_by_page:
params["offset"] += 1
params["page"] += 1
else:
params["offset"] += params["limit"]
params["page"] += 1
for page_content in self.paginate(
path=resource,
params=params,
paginator=Paginator(offset_by_page=offset_by_page),
):
yield page_content
30 changes: 30 additions & 0 deletions sources/personio/paginator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
class Paginator:
def __init__(self, offset_by_page=False):
self.offset_by_page = offset_by_page

def paginate(self, client, url, method, params, json):
starts_from_zero = False
while True:
response = client.make_request(url, method, params, json)

json_response = response.json()
yield json_response["data"]

metadata = json_response.get("metadata")
if not metadata:
break

total_pages = metadata.get("total_pages")
current_page = metadata.get("current_page")
if current_page == 0:
starts_from_zero = True

if current_page >= (total_pages - int(starts_from_zero)) or not json_response["data"]:
break

if self.offset_by_page:
params["offset"] += 1
params["page"] += 1
else:
params["offset"] += params["limit"]
params["page"] += 1
1 change: 1 addition & 0 deletions sources/personio/settings.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
BASE_URL = "https://api.personio.de/v1/"
DEFAULT_ITEMS_PER_PAGE = 200
FIRST_DAY_OF_MILLENNIUM = "2000-01-01"
Loading
Loading