Download the dataset
Image source | Download path |
---|---|
COCO 2014 images | images captions |
COCO VQA | vqa train vqa val |
Visual Genome | images part1 images part2 image meta data |
TextCaps | images annotations |
RefCOCO | annotations |
RefCOCO+ | annotations |
RefCOCOg | annotations |
OKVQA | annotations |
AOK-VQA | annotations |
OCR-VQA | annotations |
GQA | images annotations |
Filtered flickr-30k | annotations |
Multi-task conversation | annotations |
Filtered unnatural instruction | annotations |
LLaVA | Compelex reasoning Detailed description Conversation |
Download the COCO 2014 images and captions
coco 2014 images path
${MINIGPTv2_DATASET}
├── coco
│ ├── images
...
coco caption annotation path
${MINIGPTv2_DATASET}
├── coco_captions
│ └── annotations
│ ├── coco_karpathy_train.json
...
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the coco_karpathy_train.json path
Download the vqa v2 train and validation json files
├── ${MINIGPTv2_DATASET}
│ ├── vqav2
│ ├── vqa_train.json
| ├── vqa_val.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the vqa_train.json and vqa_val.json path
Download visiual genome images and annotation files
${MINIGPTv2_DATASET}
├── visual_genome
│ ├── VG_100K
│ ├── VG_100K_2
│ └── region_descriptions.json
│ └── image_data.json
...
Set image_path to visual_genome folder. Similarly, set ann_path to the visual_genome folder.
Download the TextCaps images and annotation files
├── ${MINIGPTv2_DATASET}
│ ├── textcaps
│ ├── train_images
│ ├── TextCaps_0.1_train.json
Set image_path to TextCaps train_images folder. Similarly, set ann_path to the TextCaps_0.1_train.json path
Download the RefCOCO, RefCOCO+, RefCOCOg annotation files
${MINIGPTv2_DATASET}
├── refcoco_annotations
│ ├── refcoco
│ │ ├── instances.json
│ │ ├── refs(google).p
│ │ └── refs(unc).p
│ ├── refcoco+
│ │ ├── instances.json
│ │ └── refs(unc).p
│ └── refcocog
│ ├── instances.json
│ ├── refs(google).p
│ └─── refs(und).p
...
Set image_path to the COCO 2014 image folder. Similarly, set ann_path in all the following configs to the above folder refcoco_annotations that contains refcoco, refcoco+, and refcocog.
- minigpt4/configs/datasets/coco_bbox/refcoco.yaml
- minigpt4/configs/datasets/coco_bbox/refcocog.yaml
- minigpt4/configs/datasets/coco_bbox/refcocop.yaml
- minigpt4/configs/datasets/coco_bbox/invrefcoco.yaml
- minigpt4/configs/datasets/coco_bbox/invrefcocog.yaml
- minigpt4/configs/datasets/coco_bbox/invrefcocop.yaml
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── okvqa
│ ├── okvqa_train.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the OKVQA dataset
Download the AOK-VQA annotation dataset
export AOKVQA_DIR=YOUR_DATASET_PATH
mkdir -p ${AOKVQA_DIR}
curl -fsSL https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz | tar xvz -C ${AOKVQA_DIR}
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── aokvqa
│ ├── aokvqa_v1p0_train.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the AOKVQA dataset
Download the OCR-VQA annotation files download the images with loadDataset.py script
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── ocrvqa
│ ├── images
│ ├── dataset.json
Set image_path as the ocrvqa/images folder. Similarly, set ann_path to the dataset.json
Download the GQA annotation files and images
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── gqa
│ ├── images
│ ├── train_balanced_questions.json
Set image_path as the gqa/images folder. Similarly, set ann_path to the train_balanced_questions.json
Download filtered Flickr-30k images (fill this form on official website or from kaggle) and annotation files
${MINIGPTv2_DATASET}
├── filtered_flickr
│ ├── images
│ ├── captiontobbox.json
│ ├── groundedcaption.json
│ └── phrasetobbox.json
...
Set image_path as the flickr-30k images foler. Similarly, set ann_path to the groundedcaption.json, captiontobbox.json and phrasetobbox.json for the grounded image caption, caption to bbox, and phrase to bbox datasets.
- minigpt4/configs/datasets/flickr/default.yaml
- minigpt4/configs/datasets/flickr/caption_to_phrase.yaml
- minigpt4/configs/datasets/flickr/object_to_phrase.yaml
Download the multi-task converstation dataset
Location_you_like
${MINIGPTv2_DATASET}
├── multitask_conversation
│ └── multitask_conversation.json
...
Set image_path as the COCO 2014 images folder. Similarly, set ann_path to the multitask_conversation.json file path
Download the filtered unnatural instruction annotation files (we remove the very long sentences from the original unnatural instruction dataset)
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── unnatural_instructions
│ ├── filtered_unnatural_instruction.json
There is no image path. Similarly, set ann_path to the filtered_unnatural_instruction.json file path
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── llava
│ ├── conversation_58k.json
│ ├── detail_23k.json
│ ├── complex_reasoning_77k.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the previous downloaded conversation_58k.json, detail_23k.json, and complex_reasoning_77k.json in conversation.yaml, detail.yaml, and reason.yaml, respectively.