Skip to content

visee-sdu/LSDO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TODOs

  • The code for MSRVTT and MSVD datasets
  • The code for video-level LSDOs
  • The code for frame-level LSDOs
  • The code for LSMDC and DiDeMo datasets
  • The code for text-level LSDOs

The official repository for LSDO.

Dependencies

Our model was trained and evaluated using the following package dependencies:

  • Pytorch 1.8.0
  • Python 3.7.12

Datasets

Our model was trained on MSR-VTT and MSVD datasets. Please download the datasets utilizing following commands .

# MSR-VTT
wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip

# MSVD
wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msvd_data.zip

Training

We utilize 1 NVIDIA RTX 3090 24GB GPU for training. You can directly train with following commands:

# MSR-VTT-9k
python train.py --exp_name=exp_name --videos_dir=videos_dir --scene_type=average --batch_size=32 --noclip_lr=3e-5 --dataset_name=MSRVTT --msrvtt_train_file=9k

# MSR-VTT-7K
python train.py --exp_name=exp_name --videos_dir=videos_dir --scene_type=average --batch_size=32 --noclip_lr=1e-5 --dataset_name=MSRVTT --msrvtt_train_file=7k

# MSVD
python train.py --exp_name=exp_name --videos_dir=videos_dir --scene_type=average --batch_size=32 --noclip_lr=1e-5 --dataset_name=MSVD

Evaluation

# MSR-VTT-9k
python train.py --exp_name=exp_name --videos_dir=videos_dir --scene_type=average --batch_size=32 --load_epoch=-1 --dataset_name=MSRVTT --msrvtt_train_file=9k

# MSR-VTT-7K
python train.py --exp_name=exp_name --videos_dir=videos_dir --scene_type=average --batch_size=32 --load_epoch=-1 --dataset_name=MSRVTT --msrvtt_train_file=7k

# MSVD
python train.py --exp_name=exp_name --videos_dir=videos_dir --scene_type=average --batch_size=32 --load_epoch=-1 --dataset_name=MSVD

Citation

If you find this work useful in your research, please cite the following paper:

# BibTeX
@ARTICLE{10841928,
  author={Zheng, Yanwei and Huang, Bowen and Chen, Zekai and Yu, Dongxiao},
  journal={IEEE Transactions on Image Processing}, 
  title={Enhancing Text-Video Retrieval Performance With Low-Salient but Discriminative Objects}, 
  year={2025},
  volume={34},
  number={},
  pages={581-593},
  keywords={Feature extraction;Semantics;Transformers;Visualization;Prototypes;Computational modeling;Aggregates;Indexes;Encoding;Context modeling;Text-video retrieval;low-salient but discriminative objects;cross-modal attention},
  doi={10.1109/TIP.2025.3527369}}

# GB/T 7714
[1] Zheng Y , Huang B , Chen Z ,et al.Enhancing Text-Video Retrieval Performance With Low-Salient but Discriminative Objects[J].IEEE Transactions on Image Processing, 2025.DOI:10.1109/TIP.2025.3527369.

# MLA
[1] Zheng, Yanwei , et al. "Enhancing Text-Video Retrieval Performance With Low-Salient but Discriminative Objects." IEEE Transactions on Image Processing (2025).

# APA
[1] Zheng, Y. ,  Huang, B. ,  Chen, Z. , &  Yu, D. . (2025). Enhancing text-video retrieval performance with low-salient but discriminative objects. IEEE Transactions on Image Processing.

Acknowledgement

Codebase from X-Pool.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages