Skip to content

A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds

License

Notifications You must be signed in to change notification settings

oliver-lemke/spot-compose

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds

ICRA 2024 Mobile Manipulation and Embodied Intelligence Workshop (MOMA.v2)
Best Paper

Oliver Lemke1, Zuria Bauer1, RenΓ© ZurbrΓΌgg1, Marc Pollefeys1,2, Francis Engelmann1,, Hermann Blum1,

1ETH Zurich 2Microsoft Mixed Reality & AI Labs *Equal contribution

Spot-Compose presents a comprehensive framework for integration of modern machine perception techniques with Spot, showing experiments with object grasping and dynamic drawer manipulation.

teaser

[Project Webpage] [Paper] [Teaser Video]

News πŸ“°

  • April 23rd: release of teaser video.
  • April 22nd: release on arXiv.
  • March 13th 2024: Code released.

Code Structure 🎬

spot-compose/
β”œβ”€β”€ source/                            # All source code
β”‚   β”œβ”€β”€ utils/                         # General utility functions
β”‚   β”‚   β”œβ”€β”€ coordinates.py             # Coordinate calculations (poses, translations, etc.)
β”‚   β”‚   β”œβ”€β”€ docker_communication.py    # Communication with docker servers
β”‚   β”‚   β”œβ”€β”€ environment.py             # API keys, env variables
β”‚   β”‚   β”œβ”€β”€ files.py                   # File system handling
β”‚   β”‚   β”œβ”€β”€ graspnet_interface.py      # Communication with graspnet server
β”‚   β”‚   β”œβ”€β”€ importer.py                # Config-based importing
β”‚   β”‚   β”œβ”€β”€ mask3D_interface.py        # Handling of Mask3D instance segmentation
β”‚   β”‚   β”œβ”€β”€ point_clouds.py            # Point cloud computations
β”‚   β”‚   β”œβ”€β”€ recursive_config.py        # Recursive configuration files
β”‚   β”‚   β”œβ”€β”€ scannet_200_labels.py      # Scannet200 labels (for Mask3D)
β”‚   β”‚   β”œβ”€β”€ singletons.py              # Singletons for global unique access
β”‚   β”‚   β”œβ”€β”€ user_input.py              # Handle user input
β”‚   β”‚   β”œβ”€β”€ vis.py                     # Handle visualizations
β”‚   β”‚   β”œβ”€β”€ vitpose_interface.py       # Handle communications with VitPose docker server
β”‚   β”‚   └── zero_shot_object_detection.py # Object detections from images
β”‚   β”œβ”€β”€ robot_utils/                   # Utility functions specific to spot functionality
β”‚   β”‚   β”œβ”€β”€ base.py                    # Framework and wrapper for all scripts
β”‚   β”‚   β”œβ”€β”€ basic_movements.py         # Basic robot commands (moving body / arm, stowing, etc.)
β”‚   β”‚   β”œβ”€β”€ advanced_movements.py      # Advanced robot commands (planning, complex movements)
β”‚   β”‚   β”œβ”€β”€ frame_transformer.py       # Simplified transformation between frames of reference
β”‚   β”‚   β”œβ”€β”€ video.py                   # Handle actions that require access to robot cameras
β”‚   β”‚   └── graph_nav.py               # Handle actions that require access to GraphNav service
β”‚   └── scripts/
β”‚       β”œβ”€β”€ my_robot_scripts/
β”‚       β”‚   β”œβ”€β”€ estop_nogui.py         # E-Stop
β”‚       β”‚   └── ...                    # Other action scripts
β”‚       └── point_cloud_scripts/
β”‚           β”œβ”€β”€ extract_point_cloud.py # Extract point cloud from Boston Dynamics autowalk
β”‚           β”œβ”€β”€ full_align.py          # Align autowalk and scanned point cloud
β”‚           └── vis_ply_point_clouds_with_coordinates.py # Visualize aligned point cloud
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ autowalk/                      # Raw autowalk data
β”‚   β”œβ”€β”€ point_clouds/                  # Extracted point clouds from autowalks
β”‚   β”œβ”€β”€ prescans/                      # Raw prescan data
β”‚   β”œβ”€β”€ aligned_point_clouds/          # Prescan point clouds aligned with extracted autowalk clouds
β”‚   └── masked/                        # Mask3D output given aligned point clouds
β”œβ”€β”€ configs/                           # configs
β”‚   └── config.yaml                    # Uppermost level of recursive configurations (see configs sections for more info)
β”œβ”€β”€ shells/
β”‚   β”œβ”€β”€ estop.sh                       # E-Stop script
β”‚   β”œβ”€β”€ mac_routing.sh                 # Set up networking on workstation Mac
β”‚   β”œβ”€β”€ ubuntu_routing.sh              # Set up networking on workstation Ubuntu
β”‚   β”œβ”€β”€ robot_routing.sh               # Set up networking on NUC
β”‚   └── start.sh                       # Convenient script execution
β”œβ”€β”€ README.md                          # Project documentation
β”œβ”€β”€ requirements.txt                   # pip requirements file
β”œβ”€β”€ pyproject.toml                     # Formatter and linter specs
└── LICENSE

Dependencies πŸ“

The main dependencies of the project are the following:

python: 3.8

You can set up a pip environment as follows :

git clone --recurse-submodules [email protected]:oliver-lemke/spot-compose.git
cd spot-compose
virtualenv --python="/usr/bin/python3.8" "venv/"
source venv/bin/activate
pip install -r requirements.txt

Downloads πŸ’§

The pre-trained model weigts for Yolov-based drawer detection is available here.

Docker Containers 🐳

Docker containers are used to run external neural networks. This allows for easy modularity when working with multiple methods, without tedious setup. Each docker container funtions as a self-contained server, answering requests. Please refer to utils/docker_communication.py for your own custon setup, or to the respective files in utils/ for existing containers.

To run the respective docker container, please first pull the desired image via

docker pull [Link]

Once docker has finished pulling the image, you can start a container via the Run Command. When you are inside the container shell, simply run the Start Command to start the server.

Name Link Run Command Start Command
AnyGrasp craiden/graspnet:v1.0 docker run -p 5000:5000 --gpus all -it craiden/graspnet:v1.0 python3 app.py
OpenMask3D craiden/openmask:v1.0 docker run -p 5001:5001 --gpus all -it craiden/openmask:v1.0 python3 app.py
ViTPose craiden/vitpose:v1.0 docker run -p 5002:5002 --gpus all -it craiden/vitpose:v1.0 easy_ViTPose/venv/bin/python app.py
DrawerDetection craiden/yolodrawer:v1.0 docker run -p 5004:5004 --gpus all -it craiden/yolodrawer:v1.0 python3 app.py

Detailed Setup Instructions

Point Clouds ☁️

For this project, we require two point clouds for navigation (low resolution, captured by Spot) and segmentation (high resolution, capture by commodity scanner). The former is used for initial localization and setting the origin at the apriltag fiducial. The latter is used for accurate segmentation.

Low-Resolution Spot Point Cloud

To capture the point cloud please position Spot in front of your AptrilTag and start the autowalk. Zip the resulting and data and unzip it into the data/autowalk folder. Fill in the name of the unzipped folder in the config file under pre_scanned_graphs/low_res.

High-Resolution Commodity Point Cloud

To capture the point cloud we use the 3D Scanner App on iOS. Make sure the fiducial is visible during the scan for initialization. Once the scan is complete, click on Share and export two things:

  1. All Data
  2. Point Cloud/PLY with the High Density setting enabled and Z axis up disabled

Unzip the All Data zip file into the data/prescans folder. Rename the point cloud to pcd.ply and copy it into the folder, such that the resulting directory structure looks like the following:

prescans/
β”œβ”€β”€ all_data_folder/
β”‚   β”œβ”€β”€ annotations.json
β”‚   β”œβ”€β”€ export.obj
β”‚   β”œβ”€β”€ export_refined.obj
β”‚   β”œβ”€β”€ frame_00000.jpg
β”‚   β”œβ”€β”€ frame_00000.json
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ info.json
β”‚   β”œβ”€β”€ pcd.ply.json
β”‚   β”œβ”€β”€ textured_output.jpg
β”‚   β”œβ”€β”€ textured_output.mtl
β”‚   β”œβ”€β”€ textured_output.obj
β”‚   β”œβ”€β”€ thumb_00000.jpg                
β”‚   └── world_map.arkit

Finally, fill in the name of your all_data_folder in the config file under pre_scanned_graphs/high_res.

Networking 🌐

In our project setup, we connect the robot via a NUC on Spot's back. The NUC is connected to Spot via cable, and to a router via WiFi.

However, since the robot is not directly accessible to the router, we have to (a) tell the workstation where to send information to the robot, and (b) tell the NUC to work as a bridge. You may have to adjust the addresses in the scripts to fit your setup.

Workstation Networking

On the workstation run ./shells/ubuntu_routing.sh (or ./shells/mac_routing.sh depending on your workstation operating system).

NUC Networking

First, ssh into the NUC, followed by running ./robot_routing.sh to configure the NUC as a network bridge.

Config βš™οΈ

The base config file can be found under configs/config.yaml. However, our config system allows for dynamically extending and inheriting from configs, if you have different setups on different workstations. To do this, simply specify the bottom-most file in the inheritance tree when creating the Config() object. Each config file specifies the file it inherits from in an extends field.

In our example, the overwriting config is specified in configs/template_extension.yaml, meaning the inheritance graph looks like:

template_extension.yaml ---overwrites---> config.yaml

In this example, we would specify Config(file='configs/template_extension.yaml'), which then overwrites all the config files it extends.

However, this functionality is not necessary for this project to work, so simply working with the config.yaml file as you are used to is supported by default.

Benchmark πŸ“ˆ

We provide detailed results here.

Open-Vocabulary Object Retrieval

experiments_manipulation

Dynamic Drawer Manipulation & Search

experiments_drawers

TODO πŸ”œ

  • Finish Documentation

BibTeX πŸ™

@inproceedings{lemke2024spotcompose,
  title={Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds},
  author={Oliver Lemke and Zuria Bauer and Ren{\'e} Zurbr{\"u}gg and Marc Pollefeys and Francis Engelmann and Hermann Blum},
  booktitle={2nd Workshop on Mobile Manipulation and Embodied Intelligence at ICRA 2024},
  year={2024},
}

About

A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published