Yafei Yang, Bo Yang
Project Page | Paper
This repository contains:
- Complexity Factors Calculation for Datasets under
Complexity_Factors/
. - Six Datasets Generation / Adaptation under
Dataset_Generation/
, including:- dSprites;
- Tetris;
- CLEVR;
- YCB;
- ScanNet;
- COCO.
- Four Representative Methods Re-implementation / Adaptation including:
- AIR ("Attend, Infer, Repeat: Fast Scene Understanding with Generative Models") under
AIR/
; - MONet ("MONet: Unsupervised Scene Decomposition and Representation") under
MONet/
; - IODINE ("Multi-Object Representation Learning with Iterative Variational Inference") under
IODINE/
; - Slot Attention ("Object-Centric Learning with Slot Attention") under
Slot_Attention
.
- AIR ("Attend, Infer, Repeat: Fast Scene Understanding with Generative Models") under
- Evaluation of Object Segmentation Performance under
Segmentation_Evaluation/
, including:- AP score;
- PQ score;
- Precision and Recall.
IJCV extension contains:
- Additional Complexity Factors Calculation for Background under
Complexity_Factors/
. - MOVi Datasets Generation under
Dataset_Generation/MOVi
. - Background Complexity Factors Adaptation under
Dataset_Generation/Ablation Dataset
. - Additional Baseline DINOSAUR ("Bridging the Gap to Real-World Object-Centric Learning").
- Additional Evaluation Metrics under
Segmentation_Evaluation/
, including:- ARI;
- ARP;
- ARR;
- Background Recall.
conda env create -f [env_name].yml
conda activate [env_name]
Note: Since this repo consists of implementation of different approaches, we use seperate conda environments to manage them. Specifcally, use tf1_env.yml
to build environment for IODINE, use tf2_env.yml
to build environment for Slot Attention and use pytorch_env.yml
for AIR and MONet.
Datasets used in this paper can be downloaded here. We provide both TFRecord and PNG files for each dataset. Alternatively, you can generate datasets following below instructions.
Download raw dSprites shape data from https://github.com/deepmind/dsprites-dataset. Put downloaded dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz
under Dataset_Generation/dSprites
.
Create our dSprite dataset using given shape data with:
cd Dataset_Generation
python dSprites/create_dsprites_dataset.py --n_imgs [num_imgs] --root [dSprites_location] --min_object_count 2 --max_object_count 6
This will create [num_imgs]
images and their corresponding masks under [dSprites_location]/image
and [dSprites_location]/mask
.
Download Tetrominoes dataset from https://github.com/deepmind/multi_object_datasets. Put downloaded tetrominoes_train.tfrecords
under Dataset_Generation/Tetris
.
Parse tfrecord data into images with:
cd Dataset_Generation
python Tetris/read_tetris_tfrecords.py
This will create 10000 images from tetrominoes dataset of resolution 35x35 under Tetris/tetris_source
.
Create our Tetris dataset using previously parsed images with:
python Tetris/create_tetris_dataset.py --n_imgs [num_imgs] --root [Tetris_location] --min_object_count 2 --max_object_count 6
This will create [num_imgs]
images and their corresponding masks under [Tetris_location]/image
and [Tetris_location]/mask
.
Clone and follow the instructions of repo https://github.com/facebookresearch/clevr-dataset-gen and render CLEVR images with:
cd image_generation
blender --background --python render_images.py -- --num_images [num_imgs] --min_objects 2 --max_objects 6
If you have an NVIDIA GPU with CUDA installed then you can use the GPU to accelerate rendering:
blender --background --python render_images.py -- --num_images [num_imgs] --min_objects 2 --max_objects 6 --use_gpu 1
Put rendered images and masks under Dataset_Generation/CLEVR/clevr_source/images
and Dataset_Generation/CLEVR/clevr_source/masks
.
Create our CLEVR dataset using previously rendered images with:
python CLEVR/create_clevr_dataset.py --n_imgs [num_imgs] --root [CLEVR_location] --min_object_count 2 --max_object_count 6
This will create [num_imgs]
images and their corresponding masks under [CLEVR_location]/image
and [CLEVR_location]/mask
.
Download 256-G video-YCB dataset from https://rse-lab.cs.washington.edu/projects/posecnn/. Put them under Dataset_Generation/YCB/YCB_Video_Dataset
Create our YCB dataset using raw video-YCB images with:
python YCB/create_YCB_dataset.py --n_imgs [num_imgs] --root [YCB_location] --min_object_count 2 --max_object_count 6
This will create [num_imgs]
images and their corresponding masks under [YCB_location]/image
and [YCB_location]/mask
.
Download ScanNet data and put it under Dataset_Generation/ScanNet/scannet_raw
.
Process ScanNet data into Dataset_Generation/ScanNet/scans_processed
with:
python ScanNet/process_scannet_data.py
This will parse 2d images from ScanNet sensor data, unzip raw 2d instance label (filterd version) in ScanNet and parse the offical train/val split downloaded from: https://github.com/ScanNet/ScanNet/tree/master/Tasks/Benchmark.\ Create our ScanNet dataset using processed ScanNet data with: `
python COCO/create_ScanNet_dataset.py --n_imgs [num_imgs] --root [ScanNet_location] --min_object_count 2 --max_object_count 6
This will create [num_imgs]
images and their corresponding masks under [ScanNet_location]/image
and [ScanNet_location]/mask
.
Download COCO data from http://images.cocodataset.org/zips/val2017.zip (valdiation), http://images.cocodataset.org/zips/train2017.zip (train) and http://images.cocodataset.org/annotations/annotations_trainval2017.zip (annotations). Put them under Dataset_Generation/COCO/COCO_raw
.
Parse segmentation mask from annotation file with:
python COCO/process_coco_dataset.py
Create our COCO dataset using originl COCO images and parsed masks with:
python YCB/create_ScanNet_dataset.py --n_imgs [num_imgs] --root [COCO_location] --min_object_count 2 --max_object_count 6
This will create [num_imgs]
images and their corresponding masks under [COCO_location]/image
and [COCO_location]/mask
.
Details for MOVi-C and MOVi-E datasets can be found at https://github.com/google-research/kubric/tree/main/challenges/movi. They can be directly loaded with:
ds = tfds.load("movi_c/128x128", data_dir="gs://kubric-public/tfds")
ds = tfds.load("movi_e/128x128", data_dir="gs://kubric-public/tfds")
Images and masks with PNG format can be parsed with:
python MOVi/movi_c_128.py
python MOVi/movi_e_128.py
- Use
Dataset_Generation/Ablation Dataset/object_level_ablation.py
to create datasets ablated on object level factors. - Use
Dataset_Generation/Ablation Dataset/scene_level_ablation.py
to create datasets ablated on scene level factors. - Use
Dataset_Generation/Ablation Dataset/joint_ablation.py
to create datasets ablated on both object and scene level factors. - Use
Dataset_Generation/Ablation Dataset/bg_ablation.py
to create datasets ablated on background factors.
Details examples and usages can be found in corresponding scripts.
Training:
cd AIR/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --max_steps 6
Testing
cd AIR/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --eval_mode --resume [ckpt]
where:
dataset_name
is the name of the dataset, e.g. dSprites, YCB.gpu_id
is the target cuda device id.ckpt
is the checkpoint to be resume in the testing stage.- in all experiments for AIR, we set the
max_steps
to be 6.
Training:
cd MONet/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --K_steps 7
Testing:
cd MONet/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --K_steps 7 --eval_mode --resume [ckpt]
where:
dataset_name
is the name of the dataset, e.g. dSprites, YCB.gpu_id
is the target cuda device id.ckpt
is the checkpoint to be resume in the testing stage.- in all experiments for MONet, we set the
K_steps
to be 7.
Training:
cd IODINE/
CUDA_VISIBLE_DEVICES=[gpu_id] python main.py -f with [dataset_name_train]
Testing:
cd IODINE/
CUDA_VISIBLE_DEVICES=[gpu_id] python eval.py --dataset_identifier [dataset_name_test]
where:
dataset_name_train
is the name of the trainining dataset, e.g. dSprites_train, YCB_train.dataset_name_test
is the name of the testing dataset, e.g. dSprites_test, YCB_test.gpu_id
is the target cuda device id.
Training:
cd Slot_Attention/
CUDA_VISIBLE_DEVICES=[gpu_id] python train.py --dataset [dataset_name] --num_slots 7
Testing:
cd Slot_Attention/
CUDA_VISIBLE_DEVICES=[gpu_id] python eval.py --dataset [dataset_name] --num_slots 7
where:
dataset_name
is the name of the dataset, e.g. dSprites, YCB.gpu_id
is the target cuda device id.- in all experiments for Slot Attention, we set the
num_slots
to be 7.
We use the official repo for all experiments on DINOSAUR, code and instructions can be found at: https://github.com/amazon-science/object-centric-learning-framework. Examples are as follows:
Training:
CUDA_VISIBLE_DEVICES=[gpu_id] poetry run ocl_train +experiment=projects/bridging/dinosaur/movi_c_feat_rec
Testing:
CUDA_VISIBLE_DEVICES=2 poetry run ocl_eval +evaluation=projects/bridging/metrics_coco +train_config_name=config +train_config_path=[config path]
where:
gpu_id
is the target cuda device id.config path
is the path for DINOSAUR configurations.
Calculate object-level and scene-level complexity factors with Complexity_Factors/Complexity_Factor_Evaluator.py
. Examples are provided in that script.
If you find our work useful in your research, please consider citing:
@article{yang2022,
title={Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images},
author={Yang, Yafei and Yang, Bo},
journal={NeurIPS},
year={2022}
}
@article{yang2024benchmarking,
title={Benchmarking and Analysis of Unsupervised Object Segmentation from Real-World Single Images},
author={Yang, Yafei and Yang, Bo},
journal={International Journal of Computer Vision},
volume={132},
number={6},
pages={2077--2113},
year={2024},
publisher={Springer}
}
- 5/10/2022: Initial release!
- 18/10/2024: Content related to IJCV extension has been included in this repo!
This project references the following repositories:
- https://pyro.ai/examples/air.html
- https://github.com/addtt/attend-infer-repeat-pytorch
- https://github.com/applied-ai-lab/genesis
- https://github.com/deepmind/deepmind-research/tree/master/iodine
- https://github.com/google-research/google-research/tree/master/slot_attention
- https://github.com/google-research/kubric/tree/main/challenges/movi
- https://github.com/amazon-science/object-centric-learning-framework