Skip to content

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation

Notifications You must be signed in to change notification settings

abliao/RoBridge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RoBridge

This repo contains code for the paper:

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation

Project Website || Arxiv

Overview

RoBridge adopts a three-layer architecture, consisting of a high-level cognitive planner (HCP), an invariant operable representation (IOR), and a generalist embodied agent (GEA). For example, for the instruction ``Put the blocks into the corresponding shaped slots", HCP will first plan and split the task into multiple primitive actions. Then, combined with the APIs composed of the foundation model, it will give IOR, which mainly includes the masked depth of the first perspective, the mask of the third perspective, the type of action, and the constraints. IOR is updated by HCP at a low frequency and track-anything updates the mask at a high frequency. IOR is used as the input of GEA, and GEA performs specific actions until the task is completed.

Abstract

Operating robots in open-ended scenarios with diverse tasks is a crucial research and application direction in robotics. While recent progress in natural language processing and large multimodal models has enhanced robots' ability to understand complex instructions, robot manipulation still faces the procedural skill dilemma and the declarative skill dilemma in open environments. Existing methods often compromise cognitive and executive capabilities. To address these challenges, in this paper, we propose RoBridge, a hierarchical intelligent architecture for general robotic manipulation. It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM), an invariant operable representation (IOR) serving as a symbolic bridge, and a generalist embodied agent (GEA). RoBridge maintains the declarative skill of VLM and unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution. RoBridge demonstrates significant performance improvements over existing baselines, achieving a 75% success rate on new tasks and an 83% average success rate in sim-to-real generalization using only five real-world data samples per task. This work represents a significant step towards integrating cognitive reasoning with physical execution in robotic systems, offering a new paradigm for general robotic manipulation.

Installation

  • Install dependencies
git clone https://github.com/abliao/RoBridge
cd RoBridge
path=$(pwd)

pip install -r "$path/real_world/GroundingDINO/requirements.txt"
pip install -r "$path/real_world/Track_Anything/requirements.txt"
cd "$path/gea"
./install.sh
cd "$path"
pip install -r requirements.txt

Training

Please download the expert agents first and put them in gea/expert_local. If you want to train by yourself, you can set the corresponding tasks and train in gea/gea/launch_scripts/metaworld.py.

Then, train GEA in simulator

cd gea
./start.sh

If you want to deploy on a real machine, please first collect some real machine data and place it in the data folder. Then, run the following command:

cd gea
python planseqlearn/launch_scripts/finetune.py

Evaluation

To evaluate in simulation, modify the tasks and checkpoints in gea/gea/launch_scripts/metaworld_eval.py and run

cd gea
./eval.sh

Real World

If you want to run it on a real machine, run

export PYTHONPATH=.:real_world/Track_Anything
python real_world/main.py --task "put task here"

Note for Real World Deployment

We use a kinova Gen3 robot and two D435i cameras. It is recommended that the closer the camera's pose is to the one in the simulation, the better the effect. Please set the camera's coordinate system in config/config.yaml.

Known Issues

If you have problems compiling the metaworld environment, try running

sudo apt-get install build-essential
sudo apt-get install build-essential libgl1-mesa-dev
sudo apt-get install libglew-dev libsdl2-dev libsdl2-image-dev libglm-dev libfreetype6-dev
sudo apt-get install libglfw3-dev libglfw3

Citation

If you find this code useful in your work, please consider citing

Acknowledgements

The project is based on planseqlearn, Track-Anything, and GroundingDINO. Thanks for the authors for their efforts.

About

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published