Skip to content

Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety

Notifications You must be signed in to change notification settings

AI45Lab/VLSBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VLSBench: Unveiling Information Leakage in Multimodal Safety

📢 We are currently organizing the code for VLSBench. If you are interested in our work, please star ⭐ our project.

🔥 Updates

📆[2024-12-16] 🎈 Thanks to @paperweekly to share our work: Chinese Blog 🎈

📆[2024-12-16] 🎈 We release the model checkpoints we used in paper: Qwen2-VL-VLGuard Qwen2-VL-SafeRLHF 🎈

📆[2024-11-26] 🎈 Our paper, code and dataset are released! 🎈

🎉 Introduction

Intro_img

Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs. To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky and sensitive content in the image has been revealed in the textual query. In this way, MLLMs can easily refuse these sensitive text-image queries according to textual queries. However, image-text pairs without VSIL are common in real-world scenarios and are overlooked by existing multimodal safety benchmarks. To this end, we construct multimodal visual leakless safety benchmark (VLSBench) preventing visual safety leakage from image to textual query with 2.4k image-text pairs. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o. This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL.

⚙️ Dataset

You can download our dataset from Huggingface, also, json file can be quickly accessed here

Intro_img

You can check some examples here

Intro_img

🚀 Usage

Our code support several archs

  • openai: api based model with openai format: If you are using openai apis, please remember to specify the model_name and your customized api_key and api_base in load_openai.py
  • llava: the origin llava implementation
  • llava_hf: the hugginface implementation of llava
  • llava_next: the hugginface implementation of llava 1.6 and above
  • qwen2vl: for Qwen2-VL
  • mllama: for Llama3.2-Vision

First, download the dataset in huggingface and specify the downloaded dir as the ROOT_DIR.

And specify the evaluation api key used in here.

Then, execute the following script:

python eval.py --arch llava --data_root $ROOT_DIR --output_dir ./outputs

📑 Citation

@article{hu2024vlsbench,
      title={VLSBench: Unveiling Visual Leakage in Multimodal Safety}, 
      author={Xuhao Hu and Dongrui Liu and Hao Li and Xuanjing Huang and Jing Shao},
      journal={arXiv preprint arXiv:2411.19939},
      year={2024}
}

About

Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages