Skip to content

Latest commit

 

History

History

scenegraph

Scene Graph Extraction

Scene graph extraction aims at not only detect objects in the given image, but also classify the relationships between pairs of them.

This example reproduces Graphical Contrastive Losses for Scene Graph Parsing, author's code can be found here.

DEMO

Results

VisualGenome

Model Backbone mAP@50 SGDET@20 SGDET@50 SGDET@100 PHRCLS@20 PHRCLS@50 PHRCLS@100 PREDCLS@20 PREDCLS@50 PREDCLS@100
RelDN, L0 ResNet101 29.5 22.65 30.02 35.04 32.84 35.60 36.26 60.58 65.53 66.51

Preparation

This implementation is based on GluonCV. Install GluonCV with

pip install gluoncv --upgrade

The implementation contains the following files:

.
|-- data
|   |-- dataloader.py
|   |-- __init__.py
|   |-- object.py
|   |-- prepare_visualgenome.py
|   `-- relation.py
|-- demo_reldn.py
|-- model
|   |-- faster_rcnn.py
|   |-- __init__.py
|   `-- reldn.py
|-- README.md
|-- train_faster_rcnn.py
|-- train_faster_rcnn.sh
|-- train_freq_prior.py
|-- train_reldn.py
|-- train_reldn.sh
|-- utils
|   |-- build_graph.py
|   |-- __init__.py
|   |-- metric.py
|   |-- sampling.py
|   `-- viz.py
|-- validate_reldn.py
`-- validate_reldn.sh
  • The folder data contains the data preparation script, and definition of datasets for object detection and scene graph extraction.
  • The folder model contains model definition.
  • The folder utils contains helper functions for training, validation, and visualization.
  • The script train_faster_rcnn.py trains a Faster R-CNN model on VisualGenome dataset, and train_faster_rcnn.sh includes preset parameters.
  • The script train_freq_prior.py trains the frequency counts for RelDN model training.
  • The script train_reldn.py trains a RelDN model, and train_reldn.sh includes preset parameters.
  • The script validate_reldn.py validate the trained Faster R-CNN and RelDN models, and validate_reldn.sh includes preset parameters.
  • The script demo_reldh.py makes use of trained parameters and extract an scene graph from an arbitrary input image.

Below are further steps on training your own models. Besides, we also provide pretrained model files for validation and demo:

  1. Faster R-CNN Model for Object Detection
  2. RelDN Model
  3. Faster R-CNN Model for Edge Feature

Data preparation

We provide scripts to download and prepare the VisualGenome dataset. One can run with

python data/prepare_visualgenome.py

Object Detector

First one need to train the object detection model on VisualGenome.

bash train_faster_rcnn.sh

It runs for about 20 hours on a machine with 64 CPU cores and 8 V100 GPUs.

Training RelDN

With a trained Faster R-CNN model, one can start the training of RelDN model by

bash train_reldn.sh

It runs for about 2 days with one single GPU and 8 CPU cores.

Validate RelDN

After the training, one can evaluate the results with multiple commonly-used metrics:

bash validate_reldn.sh

Demo

We provide a demo script of running the model with real-world pictures. Be aware that you need trained model to generate meaningful results from the demo, otherwise the script will download the pre-trained model automatically.