Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attribution and Intervention

Overview

This repo contains the code for the paper "Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attribution and Intervention, ICLR 2025".

This paper solves the following problems:

Attribution of Hallucination Components: We systematically identify and localize the components most responsible for hallucination generation in LVLMs. Specifically, we show that MHA modules, particularly certain heads in the middle and deeper layers, are key contributors.
Analysis of Attention Bias: We show that hallucination heads strongly favor previously generated text over visual inputs. We also reveal that this pattern is inherited from the base language model and changes slowly during the visual instruction tuning process.
Hallucination Mitigation Techniques: We develop two targeted strategies: one training-free for decoding and one involving fine-tuning. Both methods reduce over-reliance on text tokens, achieving a significant reduction in hallucination rates, outperforming existing baselines.

Demo Scripts on LLaVA

In the following, we provide code for the attribution, analysis, and intervention of hallucination heads based on LLaVA v1.5-7B model in the LLaVA folder.

Setup

cd LLaVA
pip install -e .

Identify Hallucination Heads

Dowload coco train2014 and val2014 images from here and put them in dataset/coco.

Please uncomment self._validate_model_kwargs(model_kwargs.copy()) in transformers/generation/utils.py in the source code of your installed transformers. This step is required because some model_kwargs parameters in our source code were newly added and aren't yet supported in the Transformers package.

sh ./bash_scripts/attribute.sh

The attribution result is saved in $result_path/identify_attention_head/attribution_result.json.

Analyze the Behaviour of Hallucination Heads

The attention bias of hallucination heads, corresponding to Figure 3 in the paper.

sh ./bash_scripts/analysis/attention_bias.sh

The inheritance of attention patterns from base language models, corresponding to Figure 4 in the paper.

sh ./bash_scripts/analysis/attention_inheritance.sh

The JS divergence of the attention map from the initial model throughout the instruction tuning process, corresponding to Figure 5 in the paper. We provide the checkpoints during the instruction tuning process here, put them in checkpoints/llava-v1.5-7b-instruction-tuning.

sh ./bash_scripts/analysis/js_div_in_training.sh

The effect of downscaling the text attention of hallucination and random heads, Figure 6(a) in the paper.

sh ./bash_scripts/analysis/attention_reweight_txt.sh

The effect of upscaling the image attention of hallucination and random heads, Figure 6(b) in the paper.

sh ./bash_scripts/analysis/attention_reweight_img.sh

Intervention

Training-free intervention method based on adaptive deactivation of hallucination head. We provide the intervention code for LLaVA v1.5-7B, LLaVA v1.5-13B, and LLaVA v1.6-34B.

sh ./bash_scripts/decoding.sh

Training-based intervention method based on targeted fine-tuning of hallucination head. We use the llava_v1_5_mix665k.json from LLaVA as the training data.

sh ./bash_scripts/targeted_finetune.sh

We provide the fine-tuned model of LLaVA and MiniGPT4.

Comparison with Baseline methods

We provide the evaluation code for the following baseline methods and our method in the baselines folder. The baseline implementation of baseline methods are mostly based on HALC. The evaluation results in our paper are based on this code to ensure fair comparison with the baseline methods:

Greedy decoding
VCD
DOLA
HALC
OPERA

Setup

conda env create -f environment.yml
conda activate baselines

Download pretrained_minigpt4_llama2_7b.pth from here and put in the directory baselines/model_checkpoints

Download groundingdino_swint_ogc.pth from here and put in the directory baselines/decoder_zoo/GroundingDINO/weights

Evaluation

Evaluate the baseline methods: Greedy decoding, VCD, DOLA, HALC, OPERA

cd baselines
sh ./bash_scripts/eval_baselines.sh

Evaluate our methods: TFHH and ADHH

sh ./bash_scripts/eval_ours_adhh.sh
sh ./bash_scripts/eval_ours_tfhh.sh

Evaluation on Newer Models

Our method is also applicable to newer models. We provide the evaluation code for the following newer models in the newer_models folder. Note that to support the evaluation of newer models, we need to install higher version of transformers, we use transformers==4.45.2. We provide the naive method of completely removing hallucination heads of the newer models for reference.

Llama3.2-11B
Chameleon-7B
Chameleon-30B

cd newer_models
sh decoding.sh

Acknowledgement

This repo is based on the MLLM codebase of LLaVA, the baseline implementation of HALC. Thanks for their impressive works!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LLaVA		LLaVA
baselines		baselines
dataset		dataset
figs		figs
newer_models		newer_models
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attribution and Intervention

Overview

Demo Scripts on LLaVA

Setup

Identify Hallucination Heads

Analyze the Behaviour of Hallucination Heads

Intervention

Comparison with Baseline methods

Setup

Evaluation

Evaluation on Newer Models

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

TianyunYoung/Hallucination-Attribution

Folders and files

Latest commit

History

Repository files navigation

Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attribution and Intervention

Overview

Demo Scripts on LLaVA

Setup

Identify Hallucination Heads

Analyze the Behaviour of Hallucination Heads

Intervention

Comparison with Baseline methods

Setup

Evaluation

Evaluation on Newer Models

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages