Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to upload a classification dataset #1022

Open
1 task done
AlhanuofA opened this issue Feb 21, 2025 · 7 comments
Open
1 task done

Trying to upload a classification dataset #1022

AlhanuofA opened this issue Feb 21, 2025 · 7 comments
Labels
bug Something isn't working classify Image Classification issues, PR's HUB Ultralytics HUB issues

Comments

@AlhanuofA
Copy link

Search before asking

  • I have searched the HUB issues and found no similar bug report.

HUB Component

Datasets

Bug

Hello there,

All of my attempts to upload a dataset have been unsuccessful.
Most of the time, the upload fails immediately, as shown in the screenshot below:
Image
Checking the zip file in the code validated it
Image

Just to note, I was able to successfully upload a smaller version of the dataset.
Please let me know how I can resolve this issue and successfully upload the dataset.

Your support is highly appreciated!

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

@AlhanuofA AlhanuofA added the bug Something isn't working label Feb 21, 2025
@UltralyticsAssistant UltralyticsAssistant added classify Image Classification issues, PR's HUB Ultralytics HUB issues labels Feb 21, 2025
@UltralyticsAssistant
Copy link
Member

👋 Hello @AlhanuofA, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

  • Quickstart. Start training and deploying YOLO models with HUB in seconds.
  • Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
  • Projects: Creating and Managing. Group your models into projects for improved organization.
  • Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
  • Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
  • Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
    • iOS. Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads.
    • Android. Explore TFLite acceleration on mobile devices.
  • Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.

From your description, it seems you're encountering issues uploading your dataset. If this is a 🐛 Bug Report, could you please provide a minimum reproducible example (MRE) for us to better understand and identify the issue? You can find guidance on creating an MRE here. Specifically for dataset uploads, it would be helpful to have:

  1. Details about the dataset structure and size (e.g., file format, directory organization, number of images, etc.).
  2. Steps you followed during the upload process.
  3. Screenshots (like the ones you’ve already provided—thank you!) capturing the error dialogues or any logs.

If this is a ❓ Question, it would be helpful to provide additional context, such as:

  • Dataset details
  • Environment details (browser, operating system, etc.)
  • Any specific error logs, apart from the screenshots you shared

We try to respond to all issues as promptly as possible. One of our engineers will review your issue and assist you further soon. Thank you for your patience! 🌟

@sergiuwaxmann
Copy link
Member

@AlhanuofA Hello!
Do you have enough storage available?
Do you have a stable internet connection?

@AlhanuofA
Copy link
Author

@AlhanuofA Hello! Do you have enough storage available? Do you have a stable internet connection?
Hello!
Yes, the available storage is 200 GB, and the dataset is 19 GB.
While my internet connection may not be perfect, I have successfully uploaded multiple datasets to other sites using it.

@pderrenger
Copy link
Member

@AlhanuofA Thanks for the additional information. Since storage and internet don't seem to be the primary issues, let's explore other possibilities:

  1. Dataset Structure: Double-check that your dataset .zip file is correctly structured. It should contain a data.yaml file at the root level, and the directory structure within the .zip should match the paths specified in your YAML. Refer to the Datasets documentation, specifically the "Upload Dataset" section, for a visual example. You can also download and unzip the COCO8 example to see the expected structure.

  2. Dataset Validation: Before uploading, it's crucial to validate your dataset using the check_dataset function. This helps identify potential errors that might cause upload failures. Here's how to use it:

    from ultralytics.hub import check_dataset
    
    check_dataset("path/to/your/dataset.zip", task="classify")  # Replace with your actual path and task

    Make sure to replace "path/to/your/dataset.zip" with the actual path to your .zip file and task="classify" with your dataset's task (e.g., "detect", "segment", "classify"). This will check for common issues. ultralytics.hub.check_dataset

  3. File Size Limits: Though your dataset is 19GB, there might be underlying file size limitations for individual files within the .zip. If your dataset contains very large individual image files, this could potentially cause problems. Try to keep the file size for each image as small as possible.

  4. Intermittent Upload Failures: Even with a reasonably stable connection, intermittent network issues can disrupt large uploads. It might be worth trying the upload at a different time, or from a different network, if possible, to rule out transient network problems.

  5. HUB-Specific Issues: If you've verified all of the above and the problem persists, there may be a temporary issue with Ultralytics HUB itself. If you suspect this, check the issues to check to see if others are reporting similar issues.

The upload_dataset function in the SDK is used for uploading. You can see the documentation here. It takes the dataset id and file as parameters. The API reference documentation may also be useful.

If none of these steps resolve the issue, providing more details about the specific error messages you encounter (beyond the screenshot) could help pinpoint the cause. For example, are there any error messages in the browser's developer console (usually accessed by pressing F12)?

@AlhanuofA
Copy link
Author

AlhanuofA commented Feb 23, 2025

@pderrenger Hello, thank you for your response, I appreciate it.

Here are the responses for each point:

  1. Dataset Structure: Valid structure; publishing datasets containing between 10 and 30 images per class and subset has been successful. The images belong to the dataset that I am attempting to upload.
  2. Dataset Validation: the dataset validity was verified several times, and a screenshot of the code was included in the main post.
  3. File Size Limits: currently working on resizing each image to a smaller size without affecting the quality. If possible, please share the individual size limit or estimate of it. EDIT: the image sizes vary from 59 to 130 KB
  4. Intermittent Upload Failures: Unfortunately, I can't tell if there are internet issues. I've tried multiple networks and have been working on uploading throughout the day.
  5. HUB-Specific Issues: I have reviewed all the issues I could find and am still searching for more.

Using Colab & Drive to upload datasets to Ultralytics HUB, the below is the error received:

2025-02-22 22:48:36,098 - hub_sdk.helpers.logger - ERROR - Failed to upload dataset for dataset(Dataset ID): cannot access local variable 'response' where it is not associated with a value
ERROR:hub_sdk.helpers.logger:Failed to upload dataset for dataset(Dataset ID): cannot access local variable 'response' where it is not associated with a value

I am attempting other alternative ways and will update.

@pderrenger
Copy link
Member

@AlhanuofA thank you for the detailed follow-up! Let's address this systematically:

Regarding the SDK Error
The cannot access local variable 'response' error appears to stem from a temporary SDK issue. This is being actively investigated by our engineering team. As a workaround:

  1. Update your SDK: Ensure you're using the latest version with:

    pip install -U hub-sdk
  2. Try direct web upload: Use the Ultralytics HUB Upload Dataset interface which handles large files more gracefully, especially through browsers.

Image Size Limits
There's no strict per-image size limit, but we recommend:

  • Keeping images <10MB each for optimal performance
  • Using compressed formats like JPEG (your 59-130KB range is perfect)

Colab-Specific Recommendation
For large datasets via Colab, consider:

from google.colab import files
files.download('path/to/your/dataset.zip')  # Then manually upload via HUB web interface

Validation Check
Since your dataset validation passed, let's verify the YAML structure matches the Ultralytics Dataset YAML Format. A common oversight is relative paths in the YAML - ensure they match your zip's internal structure exactly.

Next Steps
If the issue persists, please share:

  1. Exact dataset YAML contents (redacting sensitive info)
  2. Full error trace from Colab
  3. hub-sdk version from pip show hub-sdk

Our team is committed to resolving this promptly. For enterprise-scale uploads, consider our Cloud Training Solution which bypasses local upload constraints. 🚀

@yogendrasinghx
Copy link
Member

Hi @AlhanuofA,

Thank you for reaching out. Please follow the dataset upload guide in our YouTube video to ensure you're following the correct process:

📺 Video Tutorial: How to Upload a Dataset to Ultralytics HUB

Additionally, before uploading, I recommend verifying your dataset using the Ultralytics Python package by following the official documentation:

📖 Ultralytics Hub Docs: Dataset Preparation and Validation

Let me know if you need further assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working classify Image Classification issues, PR's HUB Ultralytics HUB issues
Projects
None yet
Development

No branches or pull requests

5 participants