-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨feat(source-microsoft-sharepoint): add all sites iteration #55912
base: master
Are you sure you want to change the base?
✨feat(source-microsoft-sharepoint): add all sites iteration #55912
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
/approve-regression-tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a lot of comments regarding the structure but the functionality seem to make sense to me. Feel free to push back on any of these as I'm not sure I fully understand the context
Office365-REST-Python-Client = "==2.5.5" | ||
smart-open = "==6.4.0" | ||
airbyte-cdk = {extras = ["file-based"], version = "^6"} | ||
airbyte-cdk = {extras = ["file-based"], version = "^6.38.5"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set this back to ^6
so that it gets the CDK updates?
return site | ||
|
||
|
||
def get_site_prefix(site: Site): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add return types for these added functions?
|
||
@lru_cache(maxsize=None) | ||
def get_site(graph_client: GraphClient, site_url: str = None): | ||
if site_url: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I know some tools would consider not returning immediately as a code smell (example). I would refactor this as such:
def get_site(graph_client: GraphClient, site_url: str = None):
if site_url:
return execute_query_with_retry(graph_client.sites.get_by_url(site_url))
return execute_query_with_retry(graph_client.sites.root.get())
@@ -103,6 +137,20 @@ def get_access_token(self): | |||
# Directly fetch a new access token from the auth_client each time it's called | |||
return self.auth_client._get_access_token()["access_token"] | |||
|
|||
def get_token_response_object(self, tenant_prefix: str = None) -> Callable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like if we set the value to None
by default, this should be optional. However, I don't see a case where tenant_prefix
is not provided in the current code. What am I missing?
I see the _get_scope
supports this case but I'm not sure when this can happen
for site_drive in drives: | ||
all_sites_drives.add_child(site_drive) | ||
return all_sites_drives | ||
|
||
def get_site_drive(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this was outside of the scope of the PR but can we add typing here?
else: | ||
result = reader.get_site_drive() | ||
|
||
if expected_call == "drives": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open question: It seems like we have three distinct tests and that we tried to fit all of this in a parametrize
. I fear that if one of those cases change, we might need to update this test and it would impact the other cases just because they share the same test not because their behavior has also changed. WDYT?
|
||
|
||
@lru_cache(maxsize=None) | ||
def get_site(graph_client: GraphClient, site_url: str = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like get_site
is always called with get_site_prefix
. Should we make this private to avoid exposing more things than needed?
|
||
|
||
@lru_cache(maxsize=None) | ||
def get_site(graph_client: GraphClient, site_url: str = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how the lru_cache works but it seems like GraphClient
does not implement __eq__
which I assume will only hit the cache if it is the same reference object. Is this what we want here?
|
||
return found_sites | ||
|
||
def get_drives_from_sites(self, sites: List[MutableMapping[str, Any]]) -> EntityCollection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a private method, right? Should we prefix the name by _
?
List[MutableMapping[str, Any]]: A list of site information. | ||
""" | ||
_, root_site_prefix = get_site_prefix(get_site(self.one_drive_client)) | ||
ctx = self.get_client_context() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like get_site_prefix
is also called within get_client_context
. Since get_client_context
should be private, should we pass the output of get_site_prefix
as arguments for get_client_config
? It would prevent us from calling this twice in the same context
What
Enhanced the Microsoft SharePoint connector to iterate all the Sharepoint site drives.
The implementation supports:
Follow up of #54658
Resolves #37003
How
Review guide
airbyte-integrations/connectors/source-microsoft-sharepoint/source_microsoft_sharepoint/stream_reader.py
- Focus on the get_site_drive method implementationairbyte-integrations/connectors/source-microsoft-sharepoint/unit_tests/test_stream_reader.py
- Check the test_get_site_drive test casesUser Impact
No breaking changes to existing functionality
Can this PR be safely reverted and rolled back?
[x] YES 💚
[ ] NO ❌