
Implementing video open-ended question answering tasks on the Next-GQA dataset based on the LLaVa-1.6 and GPT-4o mini models, utilizing a sliding window sampling method.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
Install the environment:
Operating System:
Conda Version:
Python Version:
CUDA Version:
Main site-packages:
tqdm
moviepy
opencv-python
openai==1.14.0
torch==2.2.0
bitsandbytes==0.42.0
flash_attn==2.5.3
transformers==4.36.2
transformers-stream-generator==0.0.4
torchvision==0.17.0
pytorchvideo @ git+https://github.com/facebookresearch/pytorchvideo.git@28fe037d212663c6a24f373b94cc5d478c8c1a1d
Run the following code to install the required packages:
pip install requirements.txt
Configure the object tracking module:
Copy the files from the SAMTrack directory to your site-packages
path to enable the target tracking functionality.
We use a large-scale video-question-answer dataset, which you can access and download from here.
Run the following code to test the experimental results without sliding window sampling (using uniform sampling across the entire video
):
python eval_gpt4v_openended.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>
Run the following code to test the experimental results without video input
:
python eval_gpt4v_openended_novideo.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_novideo/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>
Run the following code to test the experimental results without evidence segments
(i.e., segments containing ground-truth have been removed from the video):
python eval_gpt4v_openended_woevidence_separate.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_woevidence/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>
Run the following code to test the experimental results of Ground
(extracting 6 frames, separate):
python eval_gpt4v_openended_separate_ground.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_separate_ground/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>
Run the following code to test the experimental results of selecting answers using perplexity
under the sliding window method (15 stride size / 30 window size, extracting 6 frames, separate):
python eval_gpt4v_openended_sliding_separate.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_separate/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>
Run the following code to test the experimental results with the addition of Object Segment & Track(SAMTrack
) under ground truth conditions:
python eval_gpt4v_openended_separate_ground_track.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_separate_ground_samtrack/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>
Run the following code to test the experimental results of selecting answers using confidence
(with a maximum score of 1000) under the sliding window method (15 stride size / 30 window size, extracting 6 frames, separate) (Current best performance - QA-Acc: 39.80 IOP: 27.12 GQA: 13.2)
:
python eval_gpt4v_openended_sliding_separate_confidence.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_separate_confidence/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>
To be added ...
Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the Unlicense License. See LICENSE.txt
for more information.
To be added ...