-
Notifications
You must be signed in to change notification settings - Fork 4
Add verify_all_repo_access config parameter #8
base: main
Are you sure you want to change the base?
Conversation
I see the point of this, it can be useful. I'm concerned tho that disabling this feature would end up delaying catching problematic repos and risking the health of sync, because the latter would not be able to proceed unless the access is fixed as opposed to keeping the sync healthy and having a failing discovery. I also think that this goes against what the discovery mode is meant to do. |
tap_github/__init__.py
Outdated
@@ -349,6 +349,8 @@ def verify_access_for_repo(config): | |||
session.headers.update({'authorization': 'token ' + access_token, 'per_page': '1', 'page': '1'}) | |||
|
|||
repositories = get_repos_from_config(config) | |||
if not config.get('verify_all_repo_access', True) and len(repositories) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to avoid the loop if that config is set instead of mutating the param passed.
Something like:
verifify_all = config.get('verify_all_repo_access', True)
if verify_all:
for repo in repositories:
verify_access(repo)
else:
[...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks addressed by b4f9e5a
It's no longer mutating the passed parameter however it keeps the loop to avoid repeating code. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other issue could be if you have diff repositories and don't have access to a few of them.
could happen in the same or diff org
fair points. The question is if we are happy that the current ppw config import is running for 15-20 minutes? Also it consumes 2000 api calls from the max 5000/hour and can cause the parallel running tap to fail by exceeding rate limit. For the same reason the parallel running tap can also cause importing new config and the discovery to fail. We have a lot of config changes a day, and each triggers a full discovery that requires 2000 api requests. Currently we can’t run the tap in business hours only once a day at night otherwise the tap or importing new ppw config is expected to fail. Running the tap multiple times a day is not a requirement tho. Having more than 5000 call rate limit is not supported by personal access tokens, and if we need more than that then we need to authenticate the tap as github app that gives 15000 requests/hour. This feature is optional, defaults to the current behaviour and it only adds an extra option that we not necessarily have to use. |
in that case the tap fails at runtime and not when importing the config. The two things are running at different time so it’s already possible that we have access when importing the config but have no access when running the tap. |
Problem
If lot of repositories selected then verifying access to all of them takes lot of time in discovery mode. In case of 2000 repositories it takes more than 10 minutes, consumes calls from API rate limit and not allowing to deploy new config changes quickly in parallel.
Proposed changes
Introducing
verify_all_repo_access
configurable parameter that defaults toTrue
. It it'False
then it verifies only one repository access from the selected list and not all of them. If no access to a certain repository then the tap will fail at sync time and not at discovery time making the discovery and config-import time much quicker.Types of changes
What types of changes does your code introduce to PipelineWise?
Checklist
setup.py
is an individual PR and not mixed with feature or bugfix PRs[AP-NNNN]
(if applicable. AP-NNNN = JIRA ID)AP-NNN
(if applicable. AP-NNN = JIRA ID)