Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loading dataset issue with load_dataset() when training controlnet #7298

Open
bigbraindump opened this issue Nov 26, 2024 · 0 comments
Open

Comments

@bigbraindump
Copy link

Describe the bug

i'm unable to load my dataset for controlnet training using load_dataset(). however, load_from_disk() seems to work?
would appreciate if someone can explain why that's the case here

  1. for reference here's the structure of the original training files before dataset creation -
- dir train
     - dir A (illustrations)
     - dir B (SignWriting)
     - prompt.json containing:
       {"source": "B/file.png", "target": "A/file.png", "prompt": "..."}
  1. here are features after dataset creation -
  "features": {
    "control_image": {
      "_type": "Image"
    },
    "image": {
      "_type": "Image"
    },
    "caption": {
      "dtype": "string",
      "_type": "Value"
    }
  1. I've also attempted to upload the dataset to huggingface with the same error output

Steps to reproduce the bug

  1. dataset creation script

  2. controlnet training script used

  3. training parameters -

! accelerate launch diffusers/examples/controlnet/train_controlnet.py
--pretrained_model_name_or_path="stable-diffusion-v1-5/stable-diffusion-v1-5"
--output_dir="$OUTPUT_DIR"
--train_data_dir="$HF_DATASET_DIR"
--conditioning_image_column=control_image
--image_column=image
--caption_column=caption
--resolution=512
--learning_rate=1e-5
--validation_image "./validation/0a4b3c71265bb3a726457837428dda78.png" "./validation/0a5922fe2c638e6776bd62f623145004.png" "./validation/1c9f1a53106f64c682cf5d009ee7156f.png"
--validation_prompt "An illustration of a man with short hair" "An illustration of a woman with short hair" "An illustration of Barack Obama"
--train_batch_size=4
--num_train_epochs=500
--tracker_project_name="sd-controlnet-signwriting-test"
--hub_model_id="sarahahtee/signwriting-illustration-test"
--checkpointing_steps=5000
--validation_steps=1000
--report_to wandb
--push_to_hub

  1. command -
    sbatch --export=HUGGINGFACE_TOKEN=hf_token,WANDB_API_KEY=api_key script.sh

Expected behavior

11/25/2024 17:12:18 - INFO - __main__ - Initializing controlnet weights from unet
Generating train split: 1 examples [00:00, 334.85 examples/s]
Traceback (most recent call last):
  File "/data/user/user/signwriting_illustration/controlnet_huggingface/diffusers/examples/controlnet/train_controlnet.py", line 1189, in <module>
    main(args)
  File "/data/user/user/signwriting_illustration/controlnet_huggingface/diffusers/examples/controlnet/train_controlnet.py", line 923, in main
    train_dataset = make_train_dataset(args, tokenizer, accelerator)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user/user/signwriting_illustration/controlnet_huggingface/diffusers/examples/controlnet/train_controlnet.py", line 639, in make_train_dataset
    raise ValueError(
ValueError: `--image_column` value 'image' not found in dataset columns. Dataset columns are: _data_files, _fingerprint, _format_columns, _format_kwargs, _format_type, _output_all_columns, _split

Environment info

accelerate 1.1.1
huggingface-hub 0.26.2
python 3.11
torch 2.5.1
transformers 4.46.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant