Skip to content

Commit

Permalink
[docs] Update UNITER project doc
Browse files Browse the repository at this point in the history
Update doc to include VILLA citation and feature
discrepancy explaination.

ghstack-source-id: 2cb291c61b4d1b6120b05630f629f0f19d732c4e
Pull Request resolved: #1176
  • Loading branch information
Ryan-Qiyu-Jiang committed Dec 13, 2021
1 parent c54ac48 commit ed0608e
Showing 1 changed file with 22 additions and 1 deletion.
23 changes: 22 additions & 1 deletion website/docs/projects/uniter.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,33 @@ Computer Vision, 2020b. ([arXiV](https://arxiv.org/pdf/1909.11740))
}
```

This repository contains the checkpoint for the pytorch implementation of the VILLA model, released originally under this ([repo](https://github.com/zhegan27/VILLA)). Please cite the following papers if you are using VILLA model from mmf:

* Gan, Z., Chen, Y. C., Li, L., Zhu, C., Cheng, Y., & Liu, J. (2020). *Large-scale adversarial training for vision-and-language representation learning.* arXiv preprint arXiv:2006.06195. ([arXiV](https://arxiv.org/abs/2006.06195))
```
@inproceedings{gan2020large,
title={Large-Scale Adversarial Training for Vision-and-Language Representation Learning},
author={Gan, Zhe and Chen, Yen-Chun and Li, Linjie and Zhu, Chen and Cheng, Yu and Liu, Jingjing},
booktitle={NeurIPS},
year={2020}
}
```

## Installation

Follow installation instructions in the [documentation](https://mmf.readthedocs.io/en/latest/notes/installation.html).

## Training

UNITER uses image region features extracted by [BUTD](https://github.com/peteanderson80/bottom-up-attention).
These are different features than those extracted in MMF and used by default in our datasets.
Support for BUTD feature extraction through pytorch in MMF is in the works.
However this means that the UNITER and VILLA checkpoints which are pretrained on BUTD features,
do not work out of the box on image region features in MMF.
You can still finetune these checkpoints in MMF on the Faster RCNN features used in MMF datasets for comparable performance.
This is what is done by default.
Or you can download BUTD features for the dataset you're working with and change the dataset in MMF to use these.

To train a fresh UNITER model on the VQA2.0 dataset, run the following command
```
mmf_run config=projects/uniter/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=uniter
Expand All @@ -33,6 +54,7 @@ To finetune a pretrained UNITER model on the VQA2.0 dataset,
```
mmf_run config=projects/uniter/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=uniter checkpoint.resume_zoo=uniter.pretrained
```
The finetuning configs for VQA2 are from the UNITER base 4-gpu [configs](https://github.com/ChenRocks/UNITER/blob/master/config/train-vqa-base-4gpu.json). For an example finetuning config with smaller batch size consider using the ViLT VQA2 training configs, however this may yield slightly lower performance.

To finetune a pretrained [VILLA](https://arxiv.org/pdf/2006.06195.pdf) model on the VQA2.0 dataset,
```
Expand All @@ -44,5 +66,4 @@ To pretrain UNITER on the masked COCO dataset, run the following command
mmf_run config=projects/uniter/configs/masked_coco/defaults.yaml run_type=train_val dataset=masked_coco model=uniter
```


Based on the config used and `do_pretraining` defined in the config, the model can use the pretraining recipe described in the UNITER paper, or be finetuned on downstream tasks.

0 comments on commit ed0608e

Please sign in to comment.