[docs] Update UNITER project doc

Update doc to include VILLA citation and feature discrepancy explaination. ghstack-source-id: 2cb291c61b4d1b6120b05630f629f0f19d732c4e Pull Request resolved: #1176
facebookresearch · Dec 13, 2021 · ed0608e · ed0608e
1 parent c54ac48
commit ed0608e
Showing 1 changed file with 22 additions and 1 deletion.
diff --git a/website/docs/projects/uniter.md b/website/docs/projects/uniter.md
@@ -18,12 +18,33 @@ Computer Vision, 2020b. ([arXiV](https://arxiv.org/pdf/1909.11740))
 }
 ```
 
+This repository contains the checkpoint for the pytorch implementation of the VILLA model, released originally under this ([repo](https://github.com/zhegan27/VILLA)). Please cite the following papers if you are using VILLA model from mmf:
+
+* Gan, Z., Chen, Y. C., Li, L., Zhu, C., Cheng, Y., & Liu, J. (2020). *Large-scale adversarial training for vision-and-language representation learning.* arXiv preprint arXiv:2006.06195. ([arXiV](https://arxiv.org/abs/2006.06195))
+```
+@inproceedings{gan2020large,
+  title={Large-Scale Adversarial Training for Vision-and-Language Representation Learning},
+  author={Gan, Zhe and Chen, Yen-Chun and Li, Linjie and Zhu, Chen and Cheng, Yu and Liu, Jingjing},
+  booktitle={NeurIPS},
+  year={2020}
+}
+```
+
 ## Installation
 
 Follow installation instructions in the [documentation](https://mmf.readthedocs.io/en/latest/notes/installation.html).
 
 ## Training
 
+UNITER uses image region features extracted by [BUTD](https://github.com/peteanderson80/bottom-up-attention).
+These are different features than those extracted in MMF and used by default in our datasets.
+Support for BUTD feature extraction through pytorch in MMF is in the works.
+However this means that the UNITER and VILLA checkpoints which are pretrained on BUTD features,
+do not work out of the box on image region features in MMF.
+You can still finetune these checkpoints in MMF on the Faster RCNN features used in MMF datasets for comparable performance.
+This is what is done by default.
+Or you can download BUTD features for the dataset you're working with and change the dataset in MMF to use these.
+
 To train a fresh UNITER model on the VQA2.0 dataset, run the following command
 ```
 mmf_run config=projects/uniter/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=uniter
@@ -33,6 +54,7 @@ To finetune a pretrained UNITER model on the VQA2.0 dataset,
 ```
 mmf_run config=projects/uniter/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=uniter checkpoint.resume_zoo=uniter.pretrained
 ```
+The finetuning configs for VQA2 are from the UNITER base 4-gpu [configs](https://github.com/ChenRocks/UNITER/blob/master/config/train-vqa-base-4gpu.json). For an example finetuning config with smaller batch size consider using the ViLT VQA2 training configs, however this may yield slightly lower performance.
 
 To finetune a pretrained [VILLA](https://arxiv.org/pdf/2006.06195.pdf) model on the VQA2.0 dataset,
 ```
@@ -44,5 +66,4 @@ To pretrain UNITER on the masked COCO dataset, run the following command
 mmf_run config=projects/uniter/configs/masked_coco/defaults.yaml run_type=train_val dataset=masked_coco model=uniter
 ```
 
-
 Based on the config used and `do_pretraining` defined in the config, the model can use the pretraining recipe described in the UNITER paper, or be finetuned on downstream tasks.