-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: 403 errors when validating a URL #216
Comments
It seems like part of the problem is related to not identifying a |
This PR seems to have decreased instances of the error, but it still has not removed all of them. |
The example referenced (http://datos.gob.cl/dataset/c77c9a50-6dd1-449d-b5ab-947ec0139b31/resource/a4edcf07-0657-456d-bbbc-54b2aec1de8d/download/coquimbo10feb16.zip) fails checks for complete certificate chain in a couple of popular SSL checkers: It looks like it's using an SSL root that's not widely distributed yet. This would be a matter of updating the root certificates installed at the operating system level, or instructing the command that checks that the URLs can be downloaded to ignore SSL errors. These appear to be Amazon-issued certificates so it's surprising that the GitHub runners aren't coming with them installed. Bumping the runner to Edit: looks like Python doesn't use system certificates by default, so this could be a matter of the Python version. This SA post indicates how to tell Python to use the system certificates which might be a good idea here: https://stackoverflow.com/a/42982144/964125 There's also a bigger question of how strict SSL checking should be to consider a feed valid. Using the system-installed root certs that come with |
@themightychris Thank you for digging into this! re: ubuntu, it looks like GitHub Actions haven't updated to I added a draft PR that points to the system certificates to see if that would have an impact on the workflow test, but it seems to not have made a difference (which could definitely be a problem on my end). Do you mind taking a look? As a short term solution, we've talked about ignoring the test when it fails and manually testing that the URL is working and downloads a ZIP file. This is obviously not ideal, but may help with adding feeds as we debug and evaluate the certs problem. |
ubuntu-latest upgraded to 22.04 in Nov 2022. actions/runner-images#6512 |
These URLs could all be added as http instead of https for now, so at least we'd have the data. I checked they all worked as http except transporlis. Hopefully all URLs in the database will be changed to https at some point. |
Thanks for the further investigation on this @dancory-urbanfootprint! We're exploring resolving the core issue right now so fewer feeds are blocked regardless of HTTP settings - what you suggest is a good workaround though if we run into issues cc @AlfredNwolisa |
What problem is your feature request trying to solve?
One of the GitHub workflow checks evaluates if the GTFS feed can be downloaded. Often this check returns
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url
due to SSL certificate errors, even when the URL can be downloaded manually. This becomes a blocker to add new sources. Example here.Other examples where this is affecting our ability to get feeds: mdb-534 http://www.centro.org/CentroGTFS/CentroGTFS.zip
https://www.fayettevillenc.gov/home/showpublisheddocument/16121/638612293378070000
Describe the solution you'd like
Add headers in the response through the Python operations (add and update GTFS Schedule and realtime feeds).
How will we know when this is done?
As a user, I can add a source when the source is downloadable manually.
The text was updated successfully, but these errors were encountered: