Scene graph extraction aims at not only detect objects in the given image, but also classify the relationships between pairs of them.
This example reproduces Graphical Contrastive Losses for Scene Graph Parsing, author's code can be found here.
VisualGenome
Model | Backbone | mAP@50 | SGDET@20 | SGDET@50 | SGDET@100 | PHRCLS@20 | PHRCLS@50 | PHRCLS@100 | PREDCLS@20 | PREDCLS@50 | PREDCLS@100 |
---|---|---|---|---|---|---|---|---|---|---|---|
RelDN, L0 | ResNet101 | 29.5 | 22.65 | 30.02 | 35.04 | 32.84 | 35.60 | 36.26 | 60.58 | 65.53 | 66.51 |
This implementation is based on GluonCV. Install GluonCV with
pip install gluoncv --upgrade
The implementation contains the following files:
.
|-- data
| |-- dataloader.py
| |-- __init__.py
| |-- object.py
| |-- prepare_visualgenome.py
| `-- relation.py
|-- demo_reldn.py
|-- model
| |-- faster_rcnn.py
| |-- __init__.py
| `-- reldn.py
|-- README.md
|-- train_faster_rcnn.py
|-- train_faster_rcnn.sh
|-- train_freq_prior.py
|-- train_reldn.py
|-- train_reldn.sh
|-- utils
| |-- build_graph.py
| |-- __init__.py
| |-- metric.py
| |-- sampling.py
| `-- viz.py
|-- validate_reldn.py
`-- validate_reldn.sh
- The folder
data
contains the data preparation script, and definition of datasets for object detection and scene graph extraction. - The folder
model
contains model definition. - The folder
utils
contains helper functions for training, validation, and visualization. - The script
train_faster_rcnn.py
trains a Faster R-CNN model on VisualGenome dataset, andtrain_faster_rcnn.sh
includes preset parameters. - The script
train_freq_prior.py
trains the frequency counts for RelDN model training. - The script
train_reldn.py
trains a RelDN model, andtrain_reldn.sh
includes preset parameters. - The script
validate_reldn.py
validate the trained Faster R-CNN and RelDN models, andvalidate_reldn.sh
includes preset parameters. - The script
demo_reldh.py
makes use of trained parameters and extract an scene graph from an arbitrary input image.
Below are further steps on training your own models. Besides, we also provide pretrained model files for validation and demo:
We provide scripts to download and prepare the VisualGenome dataset. One can run with
python data/prepare_visualgenome.py
First one need to train the object detection model on VisualGenome.
bash train_faster_rcnn.sh
It runs for about 20 hours on a machine with 64 CPU cores and 8 V100 GPUs.
With a trained Faster R-CNN model, one can start the training of RelDN model by
bash train_reldn.sh
It runs for about 2 days with one single GPU and 8 CPU cores.
After the training, one can evaluate the results with multiple commonly-used metrics:
bash validate_reldn.sh
We provide a demo script of running the model with real-world pictures. Be aware that you need trained model to generate meaningful results from the demo, otherwise the script will download the pre-trained model automatically.