This is the official implementation of PSNet (NeuroComputing), an object detection method that is robust to perspective variances. For more details, please refer to:
PSNet: Perspective-sensitive convolutional network for object detection [Paper]
XinZhang, Yicheng Liu, Chunlei Huo, Nuo Xu, Lingfeng Wang, Chunhong Pan
Clone the code
git clone
Download the model weight used in the paper to /models/res101/spacenet/:
- Python 2.7 or 3.6
- Pytorch 0.4.0 (now it does not support 0.4.1 or higher)
- CUDA 8.0 or higher
We used ResNet101 in our experiments. You can download the pretrain model from:
Download them and put them into the data/pretrained_model/.
As pointed out by ruotianluo/pytorch-faster-rcnn, choose the right -arch
file, to compile the cuda code:
GPU model | Architecture |
TitanX (Maxwell/Pascal) | sm_52 |
GTX 960M | sm_50 |
GTX 1080 (Ti) | sm_61 |
Grid K520 (AWS g2.2xlarge) | sm_30 |
Tesla K80 (AWS p2.xlarge) | sm_37 |
More details about setting the architecture can be found here or here
Install all the python dependencies using pip:
pip install -r requirements.txt
Compile the cuda dependencies using following simple commands:
cd lib
It will compile all the modules you need, including NMS, ROI_Pooing, ROI_Align and ROI_Crop. The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version.
As pointed out in this issue, if you encounter some error during the compilation, you might miss to export the CUDA paths to your environment.
Before training, set the right directory to save and load the trained models. Change the arguments "save_dir" and "load_dir" in and to adapt to your environment.
To train a faster R-CNN model with vgg16 on pascal_voc, simply run:
--dataset pascal_voc --net vgg16 \
--lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
where 'bs' is the batch size with default 1. Alternatively, to train with resnet101 on pascal_voc, simple run:
--dataset pascal_voc --net res101 \
--lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
Above, BATCH_SIZE and WORKER_NUMBER can be set adaptively according to your GPU memory size. On Titan Xp with 12G memory, it can be up to 4.
If you have multiple (say 8) Titan Xp GPUs, then just use them all! Try:
python --dataset pascal_voc --net vgg16 \
--bs 24 --nw 8 \
--lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
--cuda --mGPUs
Change dataset to "coco" or 'vg' if you want to train on COCO or Visual Genome.
If you want to evlauate the detection performance of a pre-trained vgg16 model on pascal_voc test set, simply run
python --dataset pascal_voc --net vgg16 \
--checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
Specify the specific model session, chechepoch and checkpoint, e.g., SESSION=1, EPOCH=6, CHECKPOINT=416.
If you want to run detection on your own images with a pre-trained model, download the pretrained model listed in above tables or train your own models at first, then add images to folder $ROOT/images, and then run
python --net vgg16 \
--checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
--cuda --load_dir path/to/model/directoy
Then you will find the detection results in folder $ROOT/images.
title={PSNet: Perspective-sensitive convolutional network for object detection},
author={Zhang, Xin and Liu, Yicheng and Huo, Chunlei and Xu, Nuo and Wang, Lingfeng and Pan, Chunhong},