merlot

MERLOT: Multimodal Neural Script Knowledge Models

MERLOT (NeurIPS 2021) is a model for learning what we are calling "neural script knowledge" -- representations about what is going on in videos, spanning multiple video frames with associated captions.

Visit our project page at rowanzellers.com/merlot, or read the full paper to learn more.

What's here

We are releasing the following:

Code for the MERLOT model (in model/, with data processing in data/
Code for running MERLOT over visual story ordering.

We plan to release:

Information about the videos used in this work
Code for adapting the model to other tasks (not strictly needed, but just to make things easier)

This is somewhat ongoing -- we hope to make it somewhat easier to adapt MERLOT to other tasks, please follow if interested!

Enviroment and setup

There are two different ways of running MERLOT right now

Pretraining on videos This requires a TPU pod.
Finetuning on downstream tasks We did this on TPU v3-8 machines. You can in theory do this on GPUs, however, this isn't tested or officially supported right now.
Zero-shot visual-story ordering I have code for this on a TPU, but you should be able to do this on a GPU too.

conda create --name merlot python=3.7 && conda activate merlot
conda install -y python=3.7 tqdm numpy pyyaml scipy ipython cython typing h5py pandas

# If running on GPU
pip install tensorflow-gpu==1.15.5
# If running on TPU
pip install tensorflow==1.15.5

pip install --upgrade google-api-python-client oauth2client boto3 cloud-tpu-profiler regex opencv-python-headless Pillow seaborn
pip install numpy==1.17.0

Pretraining from scratch

This requires a large TPU pod for data-parallelism.

First, you'll need to get a bunch of training data in "tfrecord" format -- see data processing in data/ for that. You'll then need to adjust the configuration of model/configs/merlot.yaml accordingly. You'll also need to add in your output path (where you want your newly pretrained model to be saved).
Next, in the model directory, run python train.py configs/merlot.yaml

Finetuning on downstream tasks

You can download our checkpoint using download_checkpoint.py. There are two options -- we used a checkpoint with 4 frame-caption segments for general purpose pretraining, and then we trained it for longer (using 5 frame-caption segments) to adapt to the story ordering task.

We suggest using the 4 segments checkpoint because that's what we used for all of our finetuning experiments. This corresponds to the configuration at We used the configuration model/merlot.yaml.
Actual finetuning code TBD -- you just create a MerlotModel model/modeling.py, set up your finetuning task (usually involving an additional output layer), and finetune.

Bibtex

@inproceedings{zellersluhessel2021merlot,
  title={MERLOT: Multimodal Neural Script Knowledge Models},
  author={Zellers, Rowan and Lu, Ximing and Hessel, Jack and Yu, Youngjae and Park, Jae Sung and Cao, Jize and Farhadi, Ali and Choi, Yejin},
  booktitle={Advances in Neural Information Processing Systems 34},
  year={2021}
}

Name	Name	Last commit message	Last commit date
Latest commit rowanz fix ckpt loading Mar 15, 2022 fc77f99 · Mar 15, 2022 History 11 Commits
data	data	fix ckpt loading	Mar 15, 2022
downstream	downstream	add finetune code for VCR	Jan 10, 2022
model	model	add finetune code for VCR	Jan 10, 2022
utils	utils	Add model code, for both pretraining as well as for story unshuffling	Jun 10, 2021
.gitignore	.gitignore	Initial commit	Jun 7, 2021
LICENSE	LICENSE	Add model code, for both pretraining as well as for story unshuffling	Jun 10, 2021
README.md	README.md	Update README.md	Nov 19, 2021
download_checkpoint.py	download_checkpoint.py	Share information about downloading MERLOT and pretraining data	Jun 16, 2021
requirements.txt	requirements.txt	Add model code, for both pretraining as well as for story unshuffling	Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

merlot

What's here

Enviroment and setup

Pretraining from scratch

Finetuning on downstream tasks

Bibtex

About

Releases

Packages

Contributors 2

Languages

License

rowanz/merlot

Folders and files

Latest commit

History

Repository files navigation

merlot

What's here

Enviroment and setup

Pretraining from scratch

Finetuning on downstream tasks

Bibtex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages