Questions about VideoChat2_HD #194

LiJiaqi96 · 2024-06-12T02:27:07Z

Hi, thanks for your update of VideoChat2_HD! When trying the newly-released code, I got some questions:

The MetaLoader_rs class in "train_it_ds.py" seems to be missing.
So I still used "train_it.py", but got the following error. I'm not sure whether it could be solved by using MetaLoader_rs.

RuntimeError: stack expects each tensor to be equal size, but got [8, 3, 224, 448] at entry 0 and [8, 3, 448, 672] at entry 1

Then I changed the batch_size to 1 and solved the previous error. But it seems the load_and_transform_media_data_image function does not have dynamic_config setting, which is passed to it in "it_dataset_mistral.py". I created a pull request to modify this part.
Is there any place to find the newly added dataset for VideoChat2_HD? I suppose the datasets are important to improve model performances.

The text was updated successfully, but these errors were encountered:

Andy1621 · 2024-06-12T03:09:42Z

Thanks for your try! I will fix it later~

Andy1621 · 2024-06-12T03:35:35Z

@LiJiaqi96 Please have a try. have updated the code. The train_it_ds is add with deepspeed and need some change.

LiJiaqi96 · 2024-06-12T08:01:10Z

Thanks! I tried "train_it_ds.py" without using deepspeed, but it doesn't work. Is it possible to train without using deepspeed? Temporally I prefer not to use deepspeed.

Andy1621 · 2024-06-12T11:19:34Z

Yes! You can run it without deepspeed. BTW, show me you log so that I can fix the bug ~

LiJiaqi96 · 2024-06-13T07:43:38Z

Sorry for the late reply. The log is here
train_log.txt
in "config_7b_hd_stage4.py", I set enable=False in deepspeed settings.
and run the code with:

torchrun    --nnodes=${NNODE} --nproc_per_node=${NUM_GPUS} \
    --rdzv_endpoint=${MASTER_NODE}:10068 \
    --rdzv_backend=c10d \
    tasks/train_it_ds.py \
    $(dirname $0)/config_7b_hd_stage4.py \
    output_dir ${OUTPUT_DIR}

Andy1621 · 2024-06-13T08:33:32Z

I'm not sure whether it is cause by the deepspeed or pytorch verisons.
Here are my versions of different packages:

torch                     1.13.1+cu117
torchaudio                0.13.1+cu117
torchnet                  0.0.4
torchvision               0.14.1+cu117
deepspeed                 0.14.2
transformers              4.40.1

BTW, sometimes you can fix the bug by change find_unused_parameters to True or Fasle.

LiJiaqi96 · 2024-06-13T10:54:25Z

Thanks, I will create an environment with exactly the same packages and have a try.

yuanrr · 2024-06-13T12:48:36Z

Hi, I found shared_utils_ds.py has a bug in line 58.

optimizer_params = create_optimizer(config.optimizer, model, return_group=True)

the optimizer.py may need to be updated.

Andy1621 · 2024-06-13T20:34:25Z

Thanks for your feedback. I have updated the code.

LiJiaqi96 · 2024-06-18T04:01:02Z

I used the new environment except flash-attn, as I used CUDA 12.1 and can only use flash-attn==2.1.0. I ran the code "scripts/videochat_mistral/run_7b_stage4_hd.sh", with "tasks/train_it.py" and deepspeed enable=False, then got error train_log0618.txt. The error seems to be caused by flash-attn.
Is it possible to run videochat2_hd using the same environment as videochat2_mistral, withou using deepspeed?

LiJiaqi96 · 2024-06-18T09:00:46Z

BTW I test to run the code on single GPU (like python train_it.py) and it iterates normally

Andy1621 · 2024-06-18T10:33:47Z

Yes, it's okay to use it without deepspeed. I use deepspeed ZERO to decrease the GPU memory~

LiJiaqi96 · 2024-06-20T01:12:10Z

I see. Is it ok for you to run on multiple GPUs without deepspeed, just as the model runs in videochat2_mistral?

LiJiaqi96 · 2024-06-21T10:17:28Z

Update: I managed to solve the previous issue by upgrading the flash-attn to 2.5.9. When I use "train_it_ds.py" and with deepspeed enable=True, I met new issue about deepspeed config:
trainlog_0621.txt
Could you please help me solve that?

Andy1621 · 2024-06-22T18:14:11Z

Hi! Please try again with the newly commit.

LiJiaqi96 · 2024-06-24T06:59:32Z

Thanks for your update! Now the code could run with deepspeed enabled.
BTW, Is there any place to find the newly added dataset for VideoChat2_HD? I suppose the datasets are important to improve model performances.

Andy1621 · 2024-06-25T11:58:53Z

Almost all the datasets can be directly downloaded from their repos or homepages~

Give me feedback if you don't find them.

LiJiaqi96 · 2024-06-26T06:41:37Z

In "instruction_data.py", there are some newly added image datasets in M3IT, and some newly added videos datasets. Is there any place to find those video datasets? Thanks!

Andy1621 · 2024-06-26T07:52:31Z

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

LiJiaqi96 · 2024-06-26T09:19:17Z

Thanks for your sharing!

LiJiaqi96 · 2024-06-28T02:00:47Z

Another question, how could I obtain the checkpoint after VideoChat2_HD training? in "demo_mistral_hd.ipynb".
state_dict = torch.load("your_model_path/videochat2/videochat2_hd_mistral_stage4.pth", "cpu")
I noticed that there are several files in the "ckpt_latest.pth" folder, should I choose one of them?
Thanks!

LiJiaqi96 · 2024-06-28T07:19:09Z

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, could you please help me find the instruction json files such as f"{anno_root_it}/video/caption/sharegptvideo/train_300k.json", I did not find the json files in the HF VideoChat2-IT repo.

Andy1621 · 2024-06-28T23:59:39Z

Sorry for the late reply. For the checkpoint, you need to use the file named mp_xxx which saves weights. For the instruction data, I will upload it today.

Andy1621 · 2024-06-29T04:00:17Z

@LiJiaqi96 Please check the data in HuggingFace~

LiJiaqi96 · 2024-06-30T10:44:30Z

Thanks for your reply! I will try it~

LiJiaqi96 · 2024-07-01T07:36:58Z

BTW, did you evaluate the effectiveness of the VideoChat2_HD and the newly added datasets, respectively? I'm curious about whether the training scheme or the dataset matters more for the improvement. Thanks!

Andy1621 · 2024-07-01T08:45:12Z

We do not conduct serious comparisons since we want to make good use of pretrained models.

And I think both are important based on some experiments:

Stage4: Directly fine-tuning VideoChat2-Stage3 with HD on the original Stage3-dataset improved marginally.
Stage3: Fine-tuning VideoChat2-Stage2 with Stage4-dataset leads to performance drop by ~3%.

LiJiaqi96 · 2024-07-02T01:45:12Z

My experiment is consistent with your findings. I directly fine-tuning VideoChat2-Stage3 (trained by myself from Stage2, 3 epochs) with HD on the original Stage3-dataset (1 epoch), and the score on the MVBench drops from 56 to 43 ...

Andy1621 · 2024-07-03T01:47:36Z

Interesting! I think HD needs more high-resolution and high-quality data.

LiJiaqi96 · 2024-08-14T10:13:17Z

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, while downloading the datasets, I could not find the "infovqa". Could you please help me find the dataset?

LiJiaqi96 · 2024-08-30T09:29:25Z

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, while downloading the datasets, I could not find the "infovqa". Could you please help me find the dataset?

Seems to be this dataset[https://www.docvqa.org/datasets/infographicvqa]

LiJiaqi96 · 2024-08-31T02:39:35Z

Hi, I noticed that the number of DiDeMo videos provided in the json file cannot match with the google drive version. Is there any way to download the full set of DiDeMo videos? Thanks !
https://drive.google.com/drive/u/0/folders/1huOL37wNOyMdCzbl8CIvJHDwCu5HLQ5o

Andy1621 · 2024-08-31T12:19:46Z

Hi! I do not know the way to download DeDeMo, since it was downloaded in our cluster~

LiJiaqi96 · 2024-09-02T02:35:05Z

Thanks for your reply

LiJiaqi96 · 2024-09-03T03:10:21Z

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, I downloaded the videos from ShareGPTVideo as the link provided above. When I ran the code, there are errors that many files could not be found, such as: v_qx1FNJxiUuE-Scene-001, 1023599998, v_kuJO1VapxuQ-Scene-027. Did you use the "train_300k" subset of ShareGPTVideo? Thanks!

Andy1621 mentioned this issue Jul 8, 2024

Video source of the updated VideoChat2-it-HD #203

Closed

LiJiaqi96 closed this as completed Aug 30, 2024

LiJiaqi96 reopened this Aug 31, 2024

LiJiaqi96 closed this as completed Sep 2, 2024

LiJiaqi96 reopened this Sep 3, 2024

Questions about VideoChat2_HD #194

Questions about VideoChat2_HD #194

Comments

LiJiaqi96 commented Jun 12, 2024

Andy1621 commented Jun 12, 2024

Andy1621 commented Jun 12, 2024

LiJiaqi96 commented Jun 12, 2024

Andy1621 commented Jun 12, 2024

LiJiaqi96 commented Jun 13, 2024

Andy1621 commented Jun 13, 2024 • edited Loading

LiJiaqi96 commented Jun 13, 2024

yuanrr commented Jun 13, 2024

Andy1621 commented Jun 13, 2024

LiJiaqi96 commented Jun 18, 2024

LiJiaqi96 commented Jun 18, 2024

Andy1621 commented Jun 18, 2024

LiJiaqi96 commented Jun 20, 2024

LiJiaqi96 commented Jun 21, 2024

Andy1621 commented Jun 22, 2024

LiJiaqi96 commented Jun 24, 2024

Andy1621 commented Jun 25, 2024

LiJiaqi96 commented Jun 26, 2024

Andy1621 commented Jun 26, 2024

LiJiaqi96 commented Jun 26, 2024

LiJiaqi96 commented Jun 28, 2024

LiJiaqi96 commented Jun 28, 2024

Andy1621 commented Jun 28, 2024

Andy1621 commented Jun 29, 2024

LiJiaqi96 commented Jun 30, 2024

LiJiaqi96 commented Jul 1, 2024

Andy1621 commented Jul 1, 2024

LiJiaqi96 commented Jul 2, 2024

Andy1621 commented Jul 3, 2024

LiJiaqi96 commented Aug 14, 2024

LiJiaqi96 commented Aug 30, 2024

LiJiaqi96 commented Aug 31, 2024

Andy1621 commented Aug 31, 2024

LiJiaqi96 commented Sep 2, 2024

LiJiaqi96 commented Sep 3, 2024

Andy1621 commented Jun 13, 2024 •

edited

Loading