Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EuroSAT Dataset 403 Download Error #2424

Open
isaiahlg opened this issue Nov 22, 2024 · 8 comments · May be fixed by #2432
Open

EuroSAT Dataset 403 Download Error #2424

isaiahlg opened this issue Nov 22, 2024 · 8 comments · May be fixed by #2432
Labels
datasets Geospatial or benchmark datasets
Milestone

Comments

@isaiahlg
Copy link

isaiahlg commented Nov 22, 2024

Description

I'm trying to download the EuroSAT dataset, and I'm getting a 403 Forbidden HTTP error.

Image

Steps to reproduce

  1. Ensure you have a directory where the EuroSAT dataset is not already downloaded
  2. Run the following line of code to download the EuroSAT dataset: dataset = EuroSAT("path/to/data", download=True) with path/to/data set to the empty directory from step 1
  3. You should see the .zip file download, but then you'll get the following error before the .txt files download:

Image

Version

0.6.1

@isaaccorley
Copy link
Collaborator

Seems like it was able to download the images but not the files containing the splits. These were the splits from the In Domain Representation Learning paper hosted by the authors from Google. Let me see if I have them stored anywhere and I'll rehost them to HuggingFace.

@adamjstewart
Copy link
Collaborator

I think the only reason we didn't originally host those on HF was due to an unknown license, but I doubt Google would care about these files.

@adamjstewart adamjstewart added this to the 0.6.2 milestone Nov 23, 2024
@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Nov 23, 2024
@adamjstewart
Copy link
Collaborator

UC Merced is also affected by this.

@adamjstewart
Copy link
Collaborator

@tameTNT
Copy link

tameTNT commented Nov 26, 2024

I was able to get the notebook (colab link if you would like to do it yourself) to work and generate the files. I've attached the ones for EuroSAT and UC Merced (the two mentioned) for anyone who runs into this issue before they fix the storage bucket/link issue - hope that helps for the time being!

eurosat-train.txt
eurosat-val.txt
eurosat-test.txt

uc_merced-train.txt
uc_merced-val.txt
uc_merced-test.txt

@adamjstewart
Copy link
Collaborator

Commented on the other issue too, these splits are different from the original version. Not sure if the script properly sets random seeds, so it may be impossible to generate the exact same split unless someone has the old files lying around somewhere.

@tameTNT
Copy link

tameTNT commented Nov 27, 2024

As luck would have it, on the machine I'm now on I do actually have them lying around from downloading the datasets earlier last month. I didn't realise the generated ones would be different, apologies for the earlier confusion.

eurosat-train.txt (original)
eurosat-val.txt (original)
eurosat-test.txt (original)

uc_merced-train.txt (original)
uc_merced-val.txt (original)
uc_merced-test.txt (original)

@adamjstewart
Copy link
Collaborator

adamjstewart commented Nov 27, 2024

YES! I can confirm these are the original splits for EuroSAT and UC Merced based on MD5 checksums. I'll upload these to our HF account and fix the TorchGeo data loaders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants