Skip to content

Latest commit

 

History

History
79 lines (47 loc) · 4.12 KB

README.md

File metadata and controls

79 lines (47 loc) · 4.12 KB

VLSBench: Unveiling Information Leakage in Multimodal Safety

📢 We are currently organizing the code for VLSBench. If you are interested in our work, please star ⭐ our project.

🔥 Updates

📆[2024-12-16] 🎈 Thanks to @paperweekly to share our work: Chinese Blog 🎈

📆[2024-12-16] 🎈 We release the model checkpoints we used in paper: Qwen2-VL-VLGuard Qwen2-VL-SafeRLHF 🎈

📆[2024-11-26] 🎈 Our paper, code and dataset are released! 🎈

🎉 Introduction

Intro_img

Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs. To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky and sensitive content in the image has been revealed in the textual query. In this way, MLLMs can easily refuse these sensitive text-image queries according to textual queries. However, image-text pairs without VSIL are common in real-world scenarios and are overlooked by existing multimodal safety benchmarks. To this end, we construct multimodal visual leakless safety benchmark (VLSBench) preventing visual safety leakage from image to textual query with 2.4k image-text pairs. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o. This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL.

⚙️ Dataset

You can download our dataset from Huggingface, also, json file can be quickly accessed here

Intro_img

You can check some examples here

Intro_img

🚀 Usage

Our code support several archs

  • openai: api based model with openai format: If you are using openai apis, please remember to specify the model_name and your customized api_key and api_base in load_openai.py
  • llava: the origin llava implementation
  • llava_hf: the hugginface implementation of llava
  • llava_next: the hugginface implementation of llava 1.6 and above
  • qwen2vl: for Qwen2-VL
  • mllama: for Llama3.2-Vision

First, download the dataset in huggingface and specify the downloaded dir as the ROOT_DIR.

And specify the evaluation api key used in here.

Then, execute the following script:

python eval.py --arch llava --data_root $ROOT_DIR --output_dir ./outputs

📑 Citation

@article{hu2024vlsbench,
      title={VLSBench: Unveiling Visual Leakage in Multimodal Safety}, 
      author={Xuhao Hu and Dongrui Liu and Hao Li and Xuanjing Huang and Jing Shao},
      journal={arXiv preprint arXiv:2411.19939},
      year={2024}
}