-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about VideoChat2_HD #194
Comments
Thanks for your try! I will fix it later~ |
@LiJiaqi96 Please have a try. have updated the code. The |
Thanks! I tried "train_it_ds.py" without using deepspeed, but it doesn't work. Is it possible to train without using deepspeed? Temporally I prefer not to use deepspeed. |
Yes! You can run it without deepspeed. BTW, show me you log so that I can fix the bug ~ |
Sorry for the late reply. The log is here
|
I'm not sure whether it is cause by the deepspeed or pytorch verisons. torch 1.13.1+cu117
torchaudio 0.13.1+cu117
torchnet 0.0.4
torchvision 0.14.1+cu117
deepspeed 0.14.2
transformers 4.40.1 BTW, sometimes you can fix the bug by change |
Thanks, I will create an environment with exactly the same packages and have a try. |
Hi, I found
the |
Thanks for your feedback. I have updated the code. |
I used the new environment except flash-attn, as I used CUDA 12.1 and can only use flash-attn==2.1.0. I ran the code "scripts/videochat_mistral/run_7b_stage4_hd.sh", with "tasks/train_it.py" and deepspeed |
BTW I test to run the code on single GPU (like |
Yes, it's okay to use it without deepspeed. I use deepspeed ZERO to decrease the GPU memory~ |
I see. Is it ok for you to run on multiple GPUs without deepspeed, just as the model runs in videochat2_mistral? |
Update: I managed to solve the previous issue by upgrading the flash-attn to 2.5.9. When I use "train_it_ds.py" and with deepspeed |
Hi! Please try again with the newly commit. |
Thanks for your update! Now the code could run with deepspeed enabled. |
Almost all the datasets can be directly downloaded from their repos or homepages~ Give me feedback if you don't find them. |
These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo. |
Thanks for your sharing! |
Another question, how could I obtain the checkpoint after VideoChat2_HD training? in "demo_mistral_hd.ipynb". |
Hi, could you please help me find the instruction json files such as |
Sorry for the late reply. For the checkpoint, you need to use the file named |
@LiJiaqi96 Please check the data in HuggingFace~ |
Thanks for your reply! I will try it~ |
BTW, did you evaluate the effectiveness of the VideoChat2_HD and the newly added datasets, respectively? I'm curious about whether the training scheme or the dataset matters more for the improvement. Thanks! |
We do not conduct serious comparisons since we want to make good use of pretrained models. And I think both are important based on some experiments:
|
My experiment is consistent with your findings. I directly fine-tuning VideoChat2-Stage3 (trained by myself from Stage2, 3 epochs) with HD on the original Stage3-dataset (1 epoch), and the score on the MVBench drops from 56 to 43 ... |
Interesting! I think |
Hi, while downloading the datasets, I could not find the "infovqa". Could you please help me find the dataset? |
Seems to be this dataset[https://www.docvqa.org/datasets/infographicvqa] |
Hi, I noticed that the number of DiDeMo videos provided in the json file cannot match with the google drive version. Is there any way to download the full set of DiDeMo videos? Thanks ! |
Hi! I do not know the way to download DeDeMo, since it was downloaded in our cluster~ |
Thanks for your reply |
Hi, I downloaded the videos from ShareGPTVideo as the link provided above. When I ran the code, there are errors that many files could not be found, such as: v_qx1FNJxiUuE-Scene-001, 1023599998, v_kuJO1VapxuQ-Scene-027. Did you use the "train_300k" subset of ShareGPTVideo? Thanks! |
Hi, thanks for your update of VideoChat2_HD! When trying the newly-released code, I got some questions:
MetaLoader_rs
class in "train_it_ds.py" seems to be missing.MetaLoader_rs
.load_and_transform_media_data_image
function does not havedynamic_config
setting, which is passed to it in "it_dataset_mistral.py". I created a pull request to modify this part.The text was updated successfully, but these errors were encountered: