Authors: Pritam Sarkar and Ali Etemad
This repository provides the official implementation of RRPO.
Clone the repository and navigate to the RRPO directory:
git clone https://github.com/pritamqu/RRPO
cd RRPO
This repository supports three Large Video Language Models (LVLMs), each with its own dependency requirements:
- VideoChat2:
videochat2.txt
- LLaVA-Video:
llavavideo.txt
- LongVU:
longvu.txt
Follow similar steps for other models.
conda create -n llava python=3.10 -y
conda activate llava
pip install -r llavavideo.txt
The self-aligned LVLMs trained with the RRPO loss are released on Hugging Face. While these models were trained using LoRA, we also provide their merged weights to allow for direct use in evaluation and inference tools.
Model | LoRA | Merged Weights (noqa) |
---|---|---|
VideoChat2_stage3_Mistral_7B-RRPO-16f | pritamqu/VideoChat2_stage3_Mistral_7B-RRPO-16f-LORA | - |
LLaVA-Video-7B-Qwen2-RRPO-16f | pritamqu/LLaVA-Video-7B-Qwen2-RRPO-16f-LORA | pritamqu/LLaVA-Video-7B-Qwen2-RRPO-16f |
LLaVA-Video-7B-Qwen2-RRPO-32f | pritamqu/LLaVA-Video-7B-Qwen2-RRPO-32f-LORA | pritamqu/LLaVA-Video-7B-Qwen2-RRPO-32f |
LongVU_Qwen2_7B-RRPO-16f | pritamqu/LongVU_Qwen2_7B-RRPO-16f-LORA | pritamqu/LongVU_Qwen2_7B-RRPO-16f |
You can download weights as:
git clone [email protected]:pritamqu/LLaVA-Video-7B-Qwen2-RRPO-32f-LORA
Our training data is released here Self-Alignment Dataset. We release the preferred and non-preferred responses used in self-alignment training.
git clone [email protected]:datasets/pritamqu/self-alignment
The related videos can be downloaded from their original sources. Please check VideoChat-IT GitHub page regarding the details of downloading the source videos.
We also share additional details on how to use your own data here.
Before training, make sure to prepare the data and download the weights of the base models. Then you can launch the training jobs as:
VideoChat2
bash scripts/videochat2/run.sh
LLaVA-Video
bash scripts/llavavideo/run.sh
LongVU
bash scripts/longvu/run.sh
The link to the base model weights are:
We provide a simple setup to inference using our trained model.
VideoChat2
bash scripts/inference_videochat2.sh
LLaVA-Video
bash scripts/inference_llavavideo.sh
LongVU
bash scripts/inference_longvu.sh
RRPO shows consistent improvements over the base model and outperforms DPO across all benchmarks.
Models | #F | TV Bench | Temp Compass | Video Hallucer | Vid Halluc | MV Bench | Video MME | MLVU | LongVideo Bench |
---|---|---|---|---|---|---|---|---|---|
VideoChat2 | 16 | 44.0 | 59.3 | 23.1 | 73.3 | 60.2 | 41.0 | 46.4 | 40.4 |
VideoChat2 + DPO | 16 | 45.7 | 60.0 | 22.1 | 72.4 | 59.6 | 43.0 | 47.4 | 41.0 |
VideoChat2 + RRPO | 16 | 45.8 | 60.2 | 32.9 | 76.4 | 59.0 | 44.3 | 47.9 | 42.8 |
LLaVA-Video | 64 | 51.0 | 66.0 | 50.0 | 76.6 | 61.1 | 64.0 | 68.6 | 60.1 |
LLaVA-Video + DPO | 64 | 51.9 | 66.4 | 53.3 | 76.5 | 60.6 | 63.1 | 67.4 | 59.4 |
LLaVA-Video + RRPO | 64 | 51.9 | 66.8 | 55.7 | 76.5 | 62.2 | 64.5 | 69.1 | 60.4 |
LLaVA-Video + RRPO (32f) | 64 | 52.2 | 67.4 | 55.8 | 76.6 | 62.1 | 64.5 | 69.4 | 60.1 |
LongVU | 1fps | 53.7 | 63.9 | 39.2 | 67.3 | 65.5 | 56.2 | 63.6 | 48.6 |
LongVU + DPO | 1fps | 54.3 | 64.3 | 40.9 | 68.5 | 65.9 | 56.6 | 63.6 | 49.4 |
LongVU + RRPO | 1fps | 56.5 | 64.5 | 44.0 | 71.7 | 66.8 | 57.7 | 64.5 | 49.7 |
You can download evaluation benchmarks from the given links:
Next, you can run the entire evaluations following the instructions provided here.
If you find this work useful, please consider citing our paper:
@article{sarkar2025rrpo,
title={Self-Alignment of Large Video Language Models with Refined Regularized Preference Optimization},
author={Your Name et al.},
journal={arXiv preprint arXiv:2504.12083},
year={2025}
}
This project incorporates datasets and model checkpoints that are subject to their respective original licenses. Users must adhere to the terms and conditions specified by these licenses. The assets used in this work include, but are not limited to: VideoChat2-IT, VideoChat2_stage3_Mistral_7B, LLaVA-Video-7B-Qwen2, LongVU_Qwen2_7B. This project does not impose any additional constraints beyond those stipulated in the original licenses. Users must ensure their usage complies with all applicable laws and regulations. This repository is released under the Apache 2.0 License. See LICENSE for details.
For any issues or questions, please open an issue or contact Pritam Sarkar at [email protected]!