Skip to content
/ HALVA Public

[ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination

Notifications You must be signed in to change notification settings

pritamqu/HALVA

Repository files navigation

Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination

⭐ ⭐ ICLR 2025 ⭐ ⭐

Pritam Sarkar   Sayna Ebrahimi   Ali Etemad  
Ahmad Beirami   Sercan O Arik   Tomas Pfister


DPA

Overview of our phrase-level alignment training.

Setup environment

conda create -n halva python=3.10 -y
conda activate halva
pip install --upgrade pip
pip install -r req.txt
module load cuda/11.7.1
pip install flash-attn --no-build-isolation

Try HALVA!

We share a minimal setup to quickly try our HALVA! See this notebook.

Model weights

Training HALVA

Data

generative data augmented contrastive samples

  • Vision-language instructions and their correct and hallucinated responses are available here: data
  • Download the images from Visual Genome and save both part 1 and part 2 as data/vg/VG_100K and data/vg/VG_100K_2

reference samples

  • A random subset from llava_v1_5_mix665k.json. For reproducibility, we share the actual subset that has been used in our study: ref data
  • Image sources:
    • MSCOCO - download them as data/MSCOCO2017
    • TextVQA - download them as data/textvqa
    • GQA - download them as data/gqa
    • OCR-VQA - download them as data/ocr_vqa

Train

  • The base model LLaVA-v1.5 weights can be found here: 7B and 13B.
  • We use 4-A100 80GB GPUs for training, which takes 1.5 hours and 3 hours for training 7B and 13B variants, respectively. If you are using different GPUs, please make sure to match our default batch_size x gradient accumulation steps, for optimal performance with the default hyperparameters.
  • The following training script can be used to train HALVA that uses LLaVA 1.5 as the base model:
    • HALVA-7B: src/hallava_7b.sh
    • HALVA-13B: src/hallava_13b.sh

Evaluation on hallucination benchmarks

Choose the HALVA variant and their base model. We provide sample validation scripts for evaluation, please make sure to update the paths based on your setup.

MODEL="halva13b-lora"
MODEL_BASE="liuhaotian/llava-v1.5-13b"

# OR

MODEL="halva7b-lora"
MODEL_BASE="liuhaotian/llava-v1.5-7b"

CHAIR

  • Download the validation images from MSCOCO2014 and store them as data/MSCOCO2014/val2014. We use the same 500 images for validation, as used in prior work.
  • You can use the given sample script for evaluation.
##### run chair
bash src/evaluate_hall/chair.sh ${MODEL} ${MODEL_BASE}

MME-Hall

  • MME-Hall is a subset of MME consisting of existence, count, position, and color.
  • You can follow the official instructions for MME evaluation: link and download the MME benchmark.
  • Once the data is downloaded you can use the given sample script for evaluation.
##### run mme
bash src/evaluate_hall/mme.sh ${MODEL} ${MODEL_BASE}

AMBER

  • Download the validation images are from the source repo AMBER and keep them as data/amber/image/.
  • Download the annotation data directory and save as eval_hall/amber/data.
  • Once the data is downloaded you can use the given sample script for evaluation.
##### run amber evaluation on 4 GPUs in parallel if available, else run sequentially by removing & from the end
bash src/evaluate_hall/amber.sh g ${MODEL} ${MODEL_BASE} 0 &
bash src/evaluate_hall/amber.sh da ${MODEL} ${MODEL_BASE} 1 &
bash src/evaluate_hall/amber.sh dr ${MODEL} ${MODEL_BASE} 2 &
bash src/evaluate_hall/amber.sh de ${MODEL} ${MODEL_BASE} 3 &
wait
# get amber f1 for all discriminative tasks
bash src/evaluate_hall/amber_f1.sh ${MODEL}

MMHal-Bench

  • The validation data will be directly downloaded from HuggingFace. You can use the given sample script for evaluation.
##### run mmhal-bench
bash src/evaluate_hall/mmhal.sh ${MODEL} ${MODEL_BASE} 0

HallusionBench

  • Download the validation images from link and save them in data/hallusion_bench.
  • Download the annotation files from link and save them in eval_hall/hallusion_bench.
  • For more details, you can check the official repo. You can use the given sample script for evaluation.
##### run halusion-bench
bash src/evaluate_hall/hallusionbench.sh ${MODEL} ${MODEL_BASE} 0

Evaluation on general vision-language tasks

In addition to the above-mentioned evaluation on hallucination benchmarks, we also evaluate on general vision-language benchmarks. For those, we directly follow LLaVA repo as follows:

VILA

The above instructions are mainly related to the LLaVA 1.5 based checkpoints, you can find the VILA codes inside *_vila directories.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation using the given BibTeX entry:

@misc{sarkar2024halva,
      title={Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination}, 
      author={Pritam Sarkar and Sayna Ebrahimi and Ali Etemad and Ahmad Beirami and Sercan Ö. Arık and Tomas Pfister},
      year={2024},
      eprint={2405.18654},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

This code base is built upon LLaVA and VILA.

Contact me

You may directly contact me at [email protected] or connect with me through LinkedIn.

About

[ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published