This project aims at combining the two most prevalent vision-langauge tasks, first referring expression comprenhension(REC) and then visual question answering(VQA), short for REC2VQA. I finetuned VLBERT on VQAv2 and RefCOCO first to get two independent checkpoints, and then developed a demonstration webui to show this new two-stage task based on vue and django. To optimize the large model loading time, I leveraged Redis and RabbitMQ to asynchronously request large model inference after loading the large model in advance.
We recommand to use docker for installing and deploying this demonstrative vlbert app.
Before utilizing docker to deploy, we need to mannually set django database:
cd ./django/
pip install -r requirements.txt
python manage.py makemigrations api
python manage.py migrate
The reason you need to run above commands is we hope to mount ./django
directory to corresponding container /work
directroy, so everytime we make changes on the repository codes, the container can have corresponding changes.
We use Vue3+ElementPlus+Typescript
for frontend user interface developing, and use node docker image to build docker image and deploy this vue app.
Here are two ways for getting and deploying frontend docker image used in this repository:
- Run
cd ./vue/app && docker build -t mrxir/rec2vqa:vue .
or just uncommentbuild: ./vue/app
line in docker-compose.yml when directly rundocker compose up -d
- Run
docker pull mrxir/rec2vqa:vue
or just directly rundocker compose up -d
And, directly run docker compose up -d
may be the best option.
We use Django+Redis+Rabbitmq
for backend data interface developing, and use python3 docker image to build this django app environment.
As mentioned in above, you can just directly run docker compose up -d
, or build or pull it by yourself.
We use nvidia-cuda docker image to build the awful and old environment that vlbert used, which is based on Ubuntu16.04-Cuda9-Cudnn7-Gcc4.9.3-Pytorch1.1.0-Torchvision0.3.0-Python3.6
As mentioned in above, you can just directly run docker compose up -d
, or build or pull it by yourself.
Note: you must refer to this wiki for ensuring you can access nvidia gpus if you want to build vlbert docker image by yourself. Otherwise, you will find that your build image cannot correctly run on the compose stage.
In addition to above docker images, there are also some other miscellaneous files to hold, including ./vlbert/docker_build
for vlbert image build requirements(optional) and ./vlbert/(data|ckpts|model)
including vqa and rec finetuned weights, vlbert cached module weights and datasets for down-stream tasks finetuning(optional). Here is the aliyunpan link. After downloading these files, you need to place these files in corresponding path in order to mount these files into docker container workspace correctly during docker compose up -d
.
The easiest and best way for deploying this codebase is just running docke compose up -d
in repository root directory.
And we have five docker images and six services in docker compose deploying.
Docker images:
redis:7.0.5-alpine3.16
(public docker repository)rabbitmq:latest
(public docker repository)mrxir/rec2vqa:vue
(build and push by myself at docker.io/mrxir/rec2vqa:vue)mrxir/rec2vqa:django
(build and push by myself at docker.io/mrxir/rec2vqa:django)mrxir/rec2vqa:vlbert
(build and push by myself at docker.io/mrxir/rec2vqa:vlbert)
Docker services:
redis
deploy at 5672 open to all local network ipsrabbitmq
deploy at 6732 open to all local network ipsvue
deploy at 80 open to all local network ipsdjango
deploy at 8080 open to all local network ipsvlbert-recworker
deploy afterrabbitmq
booting finishedvlbert-vqaworker
deploy afterrabbitmq
booting finished
After deploying, you can visit http://$YOUR_LOCAL_IP/#/app/Main
for vue frontend interface, and http://$YOUR_LOCAL_IP:8080
for django backend data api. And if you deploy at a server, then replace YOUR_LOCAL_IP
with YOUR_REMOTE_IP
.
I deploy on my local network server, and here are urls after deploying this app and NAT traversa by Cloudflare Zero-Trust Tunnel
:
3.mp4
.
├── assets # static resources
│ ├── data_flow.png
│ ├── demo.mp4
│ ├── logo.png
│ ├── presentation.pptx
│ ├── sys_arch.png
│ └── thesis.pdf
├── django # backend django api
│ ├── api # main django app
│ ├── backend # django configurations
│ ├── db # sqlite database
│ ├── Dockerfile # django docker build file
│ ├── manage.py # django main program
│ ├── media # django host static files path
│ ├── recworker.py # referring expression comprehension asynchronous worker
│ ├── requirements.txt # python dependencies
│ └── vqaworker.py # visual question answering asynchronous worker
├── docker-compose.yml # docker compose configuration file
├── README.md
├── vlbert # vision-language large model for VQA and REC
│ ├── cfgs
│ ├── ckpts
│ ├── common
│ ├── data
│ ├── Dockerfile
│ ├── external
│ ├── figs
│ ├── LICENSE
│ ├── model
│ ├── pretrain
│ ├── README.md
│ ├── refcoco
│ ├── requirements.txt
│ ├── scripts
│ ├── vcr
│ ├── viz
│ └── vqa
└── vue # frontend vue app
└── app
21 directories, 17 files
This repository is developed based mainly on VLBERT(for vlbert pytorch model finetuning and inference) and GradCam Demo & MAttNet Demo(for combining redis, rabbitmq and django to asynchronously request model inference and realtime communication using websockets).