Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds
Oliver Lemke1, Zuria Bauer1, RenΓ© ZurbrΓΌgg1, Marc Pollefeys1,2, Francis Engelmann1,, Hermann Blum1,
1ETH Zurich 2Microsoft Mixed Reality & AI Labs *Equal contribution
Spot-Compose presents a comprehensive framework for integration of modern machine perception techniques with Spot, showing experiments with object grasping and dynamic drawer manipulation.
[Project Webpage] [Paper] [Teaser Video]
- April 23rd: release of teaser video.
- April 22nd: release on arXiv.
- March 13th 2024: Code released.
spot-compose/
βββ source/ # All source code
β βββ utils/ # General utility functions
β β βββ coordinates.py # Coordinate calculations (poses, translations, etc.)
β β βββ docker_communication.py # Communication with docker servers
β β βββ environment.py # API keys, env variables
β β βββ files.py # File system handling
β β βββ graspnet_interface.py # Communication with graspnet server
β β βββ importer.py # Config-based importing
β β βββ mask3D_interface.py # Handling of Mask3D instance segmentation
β β βββ point_clouds.py # Point cloud computations
β β βββ recursive_config.py # Recursive configuration files
β β βββ scannet_200_labels.py # Scannet200 labels (for Mask3D)
β β βββ singletons.py # Singletons for global unique access
β β βββ user_input.py # Handle user input
β β βββ vis.py # Handle visualizations
β β βββ vitpose_interface.py # Handle communications with VitPose docker server
β β βββ zero_shot_object_detection.py # Object detections from images
β βββ robot_utils/ # Utility functions specific to spot functionality
β β βββ base.py # Framework and wrapper for all scripts
β β βββ basic_movements.py # Basic robot commands (moving body / arm, stowing, etc.)
β β βββ advanced_movements.py # Advanced robot commands (planning, complex movements)
β β βββ frame_transformer.py # Simplified transformation between frames of reference
β β βββ video.py # Handle actions that require access to robot cameras
β β βββ graph_nav.py # Handle actions that require access to GraphNav service
β βββ scripts/
β βββ my_robot_scripts/
β β βββ estop_nogui.py # E-Stop
β β βββ ... # Other action scripts
β βββ point_cloud_scripts/
β βββ extract_point_cloud.py # Extract point cloud from Boston Dynamics autowalk
β βββ full_align.py # Align autowalk and scanned point cloud
β βββ vis_ply_point_clouds_with_coordinates.py # Visualize aligned point cloud
βββ data/
β βββ autowalk/ # Raw autowalk data
β βββ point_clouds/ # Extracted point clouds from autowalks
β βββ prescans/ # Raw prescan data
β βββ aligned_point_clouds/ # Prescan point clouds aligned with extracted autowalk clouds
β βββ masked/ # Mask3D output given aligned point clouds
βββ configs/ # configs
β βββ config.yaml # Uppermost level of recursive configurations (see configs sections for more info)
βββ shells/
β βββ estop.sh # E-Stop script
β βββ mac_routing.sh # Set up networking on workstation Mac
β βββ ubuntu_routing.sh # Set up networking on workstation Ubuntu
β βββ robot_routing.sh # Set up networking on NUC
β βββ start.sh # Convenient script execution
βββ README.md # Project documentation
βββ requirements.txt # pip requirements file
βββ pyproject.toml # Formatter and linter specs
βββ LICENSE
The main dependencies of the project are the following:
python: 3.8
You can set up a pip environment as follows :
git clone --recurse-submodules [email protected]:oliver-lemke/spot-compose.git
cd spot-compose
virtualenv --python="/usr/bin/python3.8" "venv/"
source venv/bin/activate
pip install -r requirements.txt
The pre-trained model weigts for Yolov-based drawer detection is available here.
Docker containers are used to run external neural networks. This allows for easy modularity when working with multiple methods, without tedious setup.
Each docker container funtions as a self-contained server, answering requests. Please refer to utils/docker_communication.py
for your own custon setup, or to the respective files in utils/
for existing containers.
To run the respective docker container, please first pull the desired image via
docker pull [Link]
Once docker has finished pulling the image, you can start a container via the Run Command
.
When you are inside the container shell, simply run the Start Command
to start the server.
Name | Link | Run Command | Start Command |
---|---|---|---|
AnyGrasp | craiden/graspnet:v1.0 | docker run -p 5000:5000 --gpus all -it craiden/graspnet:v1.0 |
python3 app.py |
OpenMask3D | craiden/openmask:v1.0 | docker run -p 5001:5001 --gpus all -it craiden/openmask:v1.0 |
python3 app.py |
ViTPose | craiden/vitpose:v1.0 | docker run -p 5002:5002 --gpus all -it craiden/vitpose:v1.0 |
easy_ViTPose/venv/bin/python app.py |
DrawerDetection | craiden/yolodrawer:v1.0 | docker run -p 5004:5004 --gpus all -it craiden/yolodrawer:v1.0 |
python3 app.py |
For this project, we require two point clouds for navigation (low resolution, captured by Spot) and segmentation (high resolution, capture by commodity scanner). The former is used for initial localization and setting the origin at the apriltag fiducial. The latter is used for accurate segmentation.
To capture the point cloud please position Spot in front of your AptrilTag and start the autowalk.
Zip the resulting and data and unzip it into the data/autowalk
folder.
Fill in the name of the unzipped folder in the config file under pre_scanned_graphs/low_res
.
To capture the point cloud we use the 3D Scanner App on iOS.
Make sure the fiducial is visible during the scan for initialization.
Once the scan is complete, click on Share
and export two things:
All Data
Point Cloud/PLY
with theHigh Density
setting enabled andZ axis up
disabled
Unzip the All Data
zip file into the data/prescans
folder. Rename the point cloud to pcd.ply
and copy it into the folder, such that the resulting directory structure looks like the following:
prescans/
βββ all_data_folder/
β βββ annotations.json
β βββ export.obj
β βββ export_refined.obj
β βββ frame_00000.jpg
β βββ frame_00000.json
β βββ ...
β βββ info.json
β βββ pcd.ply.json
β βββ textured_output.jpg
β βββ textured_output.mtl
β βββ textured_output.obj
β βββ thumb_00000.jpg
β βββ world_map.arkit
Finally, fill in the name of your all_data_folder
in the config file under pre_scanned_graphs/high_res
.
In our project setup, we connect the robot via a NUC on Spot's back. The NUC is connected to Spot via cable, and to a router via WiFi.
However, since the robot is not directly accessible to the router, we have to (a) tell the workstation where to send information to the robot, and (b) tell the NUC to work as a bridge. You may have to adjust the addresses in the scripts to fit your setup.
On the workstation run ./shells/ubuntu_routing.sh
(or ./shells/mac_routing.sh
depending on your workstation operating system).
First, ssh into the NUC, followed by running ./robot_routing.sh
to configure the NUC as a network bridge.
The base config file can be found under configs/config.yaml.
However, our config system allows for dynamically extending and inheriting from configs, if you have different setups on different workstations.
To do this, simply specify the bottom-most file in the inheritance tree when creating the Config()
object. Each config file specifies the file it inherits from in an extends
field.
In our example, the overwriting config is specified in configs/template_extension.yaml
, meaning the inheritance graph looks like:
template_extension.yaml ---overwrites---> config.yaml
In this example, we would specify Config(file='configs/template_extension.yaml')
, which then overwrites all the config files it extends.
However, this functionality is not necessary for this project to work, so simply working with the config.yaml
file as you are used to is supported by default.
We provide detailed results here.
- Finish Documentation
@inproceedings{lemke2024spotcompose,
title={Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds},
author={Oliver Lemke and Zuria Bauer and Ren{\'e} Zurbr{\"u}gg and Marc Pollefeys and Francis Engelmann and Hermann Blum},
booktitle={2nd Workshop on Mobile Manipulation and Embodied Intelligence at ICRA 2024},
year={2024},
}