Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Github] Error while checking for inaccessible repositories. Exception: 403 when trying to sync private repositories #2636

Open
spong opened this issue Jun 12, 2024 · 15 comments

Comments

@spong
Copy link
Member

spong commented Jun 12, 2024

Bug Description

I was trying to sync some internal documentation from the https://github.com/elastic/security-team repo, which is an Elastic private repository (not internal), and if specifying the repo in the List of repositories field within the config, the sync will fail with the following error:

Stack trace

[FMWK][22:44:49][ERROR] [Connector id: sdyRBZABSQy1BdxtPVqF, index name: github-docs, Sync job id: jeLCCpABSQy1BdxtYKnM] Error while checking for inaccessible repositories. Exception: 403, message='Forbidden', url=URL('https://api.github.com/graphql').
Traceback (most recent call last):
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 1361, in _get_invalid_repos_for_personal_access_token
    async for repo in self.github_client.get_org_repos(
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 926, in get_org_repos
    async for response in self.paginated_api_call(
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 853, in paginated_api_call
    response = await self.graphql(query=query, variables=variables)
  File "/Users/garrettspong/dev/connectors/connectors/utils.py", line 571, in wrapped
    raise e
  File "/Users/garrettspong/dev/connectors/connectors/utils.py", line 568, in wrapped
    return await func(*args, **kwargs)
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 779, in graphql
    return await self._get_client.graphql(
  File "/Users/garrettspong/dev/connectors/lib/python3.10/site-packages/gidgethub/abc.py", line 264, in graphql
    status_code, response_headers, response_data = await self._request(
  File "/Users/garrettspong/dev/connectors/lib/python3.10/site-packages/gidgethub/aiohttp.py", line 19, in _request
    async with self._session.request(
  File "/Users/garrettspong/dev/connectors/lib/python3.10/site-packages/aiohttp/client.py", line 1197, in __aenter__
    self._resp = await self._coro
  File "/Users/garrettspong/dev/connectors/lib/python3.10/site-packages/aiohttp/client.py", line 696, in _request
    resp.raise_for_status()
  File "/Users/garrettspong/dev/connectors/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1070, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 403, message='Forbidden', url=URL('https://api.github.com/graphql')
[FMWK][22:44:49][ERROR] [Connector id: sdyRBZABSQy1BdxtPVqF, index name: github-docs, Sync job id: jeLCCpABSQy1BdxtYKnM] 403, message='Forbidden', url=URL('https://api.github.com/graphql')
Traceback (most recent call last):
  File "/Users/garrettspong/dev/connectors/connectors/sync_job_runner.py", line 167, in execute
    await self.data_provider.validate_config()
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 1466, in validate_config
    await self._remote_validation()
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 1429, in _remote_validation
    await self._validate_configured_repos()
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 1456, in _validate_configured_repos
    invalid_repos = await self.get_invalid_repos()
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 1269, in get_invalid_repos
    return await self._get_invalid_repos_for_personal_access_token()
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 1361, in _get_invalid_repos_for_personal_access_token
    async for repo in self.github_client.get_org_repos(
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 926, in get_org_repos
    async for response in self.paginated_api_call(
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 853, in paginated_api_call
    response = await self.graphql(query=query, variables=variables)
  File "/Users/garrettspong/dev/connectors/connectors/utils.py", line 571, in wrapped
    raise e
  File "/Users/garrettspong/dev/connectors/connectors/utils.py", line 568, in wrapped
    return await func(*args, **kwargs)
  File "/Users/garrettspong/dev/connectors/connectors/sources/github.py", line 779, in graphql
    return await self._get_client.graphql(
  File "/Users/garrettspong/dev/connectors/lib/python3.10/site-packages/gidgethub/abc.py", line 264, in graphql
    status_code, response_headers, response_data = await self._request(
  File "/Users/garrettspong/dev/connectors/lib/python3.10/site-packages/gidgethub/aiohttp.py", line 19, in _request
    async with self._session.request(
  File "/Users/garrettspong/dev/connectors/lib/python3.10/site-packages/aiohttp/client.py", line 1197, in __aenter__
    self._resp = await self._coro
  File "/Users/garrettspong/dev/connectors/lib/python3.10/site-packages/aiohttp/client.py", line 696, in _request
    resp.raise_for_status()
  File "/Users/garrettspong/dev/connectors/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1070, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 403, message='Forbidden', url=URL('https://api.github.com/graphql')

To Reproduce

Steps to reproduce the behavior:

  1. Setup the Github Connector with the following configuration and sync:

Expected behavior

So long as the access token has access to the repo (which it does), the content should be synced.

Environment

Running Kibana main from source, ES via yarn es snapshot, and Github connector main from source as well.

Additional context

If you configure List of repositories to be *, and provide the repo filter via an Advanced Filter (below), syncing will work without issue.

Advanced Filter
[
  {
    "filter": {
      "pr": "is:pr  label:\"Team:Security Generative AI\""
    },
    "repository": "elastic/security-team"
  },
  {
    "filter": {
      "issue": "is:issue label:\"Team:Security Generative AI\""
    },
    "repository": "elastic/security-team"
  }
]
@spong spong added the bug Something isn't working label Jun 12, 2024
@parthpuri-elastic
Copy link
Contributor

We are able to index documents from a private repository by specifying the repo in the list of repositories. However, we receive a forbidden error only when the rate limit is exceeded, it applies to both private and public repositories.

@khushbu-elastic
Copy link

@danajuratoni @artem-shelkovnikov Could you please check this & update?

@artem-shelkovnikov
Copy link
Member

@spong can you give it a try again? If it does not work, can we pair to investigate it together?

@moxarth-rathod
Copy link
Contributor

We are able to index documents from a private repository by specifying the repo in the list of repositories. However, we receive a forbidden error only when the rate limit is exceeded, it applies to both private and public repositories.

@spong does this work for you? If yes, can we close this issue? Also, the PR is merged to main so you can give it a try there as well, meantime we're raising a backport PR.

@spong
Copy link
Member Author

spong commented Aug 12, 2024

Sorry @moxarth-elastic, I've been in-and-out on PTO and had been focused on some release items before then so didn't have a chance to confirm/repro. Just catching up on a few things now, but will test and confirm all is good here shortly 👍

@spong
Copy link
Member Author

spong commented Aug 12, 2024

Just pulled the latest from elastic/connectors, then followed these instructions creating a new github connector within Kibana, updating the config.yml, then running make install/make run and I'm seeing the same error/issue:

image

After that error, if I go and update the List of repositories configuration from security-team to *, and then manually add the repo filter as detailed in the description, it syncs without issue:

image

Let me know if you need any more details or feel free to reach out on slack if you'd like to pair -- happy to help however I can 🙂

@nekrich
Copy link

nekrich commented Aug 30, 2024

We are experiencing the same situation with rate limits during the initial full sync.
Full sync fails, incremental scans the same info, and sync fails in ~20 minutes.

Image

@spong
Copy link
Member Author

spong commented Sep 4, 2024

@moxarth-elastic and I just paired and were able to reproduce on my machine running kibana/es/connectors all from source, on the main branch. In testing we actually saw some documents get ingested this time, but then it errored out with the same above error. Subsequent syncs failed before ingesting any data.

@moxarth-elastic tried reproducing using my same token using both 8.11 and 8.15 cloud deployments and running connectors main locally, and was unable to reproduce the error (all documents synced without issue), so seems this may only be an issue when running all three applications from source.

@moxarth-rathod
Copy link
Contributor

We are experiencing the same situation with rate limits during the initial full sync. Full sync fails, incremental scans the same info, and sync fails in ~20 minutes.

Image

hi @nekrich we've already fixed this issue in this PR #2711, did you try to run the connector against that one?

@artem-shelkovnikov
Copy link
Member

@moxarth-elastic and I just paired and were able to reproduce on my machine running kibana/es/connectors all from source, on the main branch. In testing we actually saw some documents get ingested this time, but then it errored out with the same above error. Subsequent syncs failed before ingesting any data.

That looks weird - as if Github started throttling you out or marked our connector as something breaching security?

Should we follow up with Github on that? @elastic/ingestion-team

@danajuratoni
Copy link
Contributor

Should we follow up with Github on that?

Yes, please! Does this occur for native connectors as well, or only self-managed ones?

@artem-shelkovnikov
Copy link
Member

@moxarth-elastic @spong reading a bit about throttling, the limits for api keys are quite strict (5000 requests per hour).

Was it possible to run a sync after an hour or so? Have you been able to see the rate limits for your account when syncing?

@moxarth-rathod
Copy link
Contributor

@moxarth-elastic @spong reading a bit about throttling, the limits for api keys are quite strict (5000 requests per hour).

Was it possible to run a sync after an hour or so? Have you been able to see the rate limits for your account when syncing?

If the problem is related to rate limit, I should have got this error too but I was able to ingest the documents of the private repo - security-team with the same API token that @spong is using.

I even tested the connector on the Kibana setup in local machine, but i could not reproduce the issue there too. In my case, the connector is working normally. Here is the log file for the reference: github-privaterepo-with-organization.log

@artem-shelkovnikov
Copy link
Member

I have a feeling it's something weird, maybe anti-abuse kicks in: https://github.com/orgs/community/discussions/24494?

Or, could be something related to local setup (routing, VPNs and such).

@moxarth-elastic - can we add more logs?

Specifically, good to log:

  1. Rate limits when we're rate-limited. We can output all rate-limit related info into debug logs so that we can see if it's related or not
  2. On any API non-200 request have a debug log that says what the Github API actually said

This will help us understand better what's happening and submit a ticket to Github.

@moxarth-rathod
Copy link
Contributor

I have a feeling it's something weird, maybe anti-abuse kicks in: https://github.com/orgs/community/discussions/24494?

Or, could be something related to local setup (routing, VPNs and such).

@moxarth-elastic - can we add more logs?

Specifically, good to log:

  1. Rate limits when we're rate-limited. We can output all rate-limit related info into debug logs so that we can see if it's related or not
  2. On any API non-200 request have a debug log that says what the Github API actually said

This will help us understand better what's happening and submit a ticket to Github.

@artem-shelkovnikov Parth has added logs in this PR #2816, please take a look and drop a suggestion if any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants