VLSBench: Unveiling Information Leakage in Multimodal Safety

📢 We are currently organizing the code for VLSBench. If you are interested in our work, please star ⭐ our project.

🔥 Updates

📆[2024-12-16] 🎈 Thanks to @paperweekly to share our work: Chinese Blog 🎈

📆[2024-12-16] 🎈 We release the model checkpoints we used in paper: Qwen2-VL-VLGuard Qwen2-VL-SafeRLHF 🎈

📆[2024-11-26] 🎈 Our paper, code and dataset are released! 🎈

🎉 Introduction

Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs. To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky and sensitive content in the image has been revealed in the textual query. In this way, MLLMs can easily refuse these sensitive text-image queries according to textual queries. However, image-text pairs without VSIL are common in real-world scenarios and are overlooked by existing multimodal safety benchmarks. To this end, we construct multimodal visual leakless safety benchmark (VLSBench) preventing visual safety leakage from image to textual query with 2.4k image-text pairs. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o. This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL.

⚙️ Dataset

You can download our dataset from Huggingface, also, json file can be quickly accessed here

You can check some examples here

🚀 Usage

Our code support several archs

openai: api based model with openai format: If you are using openai apis, please remember to specify the model_name and your customized api_key and api_base in load_openai.py
llava: the origin llava implementation
llava_hf: the hugginface implementation of llava
llava_next: the hugginface implementation of llava 1.6 and above
qwen2vl: for Qwen2-VL
mllama: for Llama3.2-Vision

First, download the dataset in huggingface and specify the downloaded dir as the ROOT_DIR.

And specify the evaluation api key used in here.

Then, execute the following script:

python eval.py --arch llava --data_root $ROOT_DIR --output_dir ./outputs

📑 Citation

@article{hu2024vlsbench,
      title={VLSBench: Unveiling Visual Leakage in Multimodal Safety}, 
      author={Xuhao Hu and Dongrui Liu and Hao Li and Xuanjing Huang and Jing Shao},
      journal={arXiv preprint arXiv:2411.19939},
      year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VLSBench: Unveiling Information Leakage in Multimodal Safety

🔥 Updates

🎉 Introduction

⚙️ Dataset

🚀 Usage

📑 Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

VLSBench: Unveiling Information Leakage in Multimodal Safety

🔥 Updates

🎉 Introduction

⚙️ Dataset

🚀 Usage

📑 Citation