Skip to content

Commit

Permalink
Doc and code cleanup (#25)
Browse files Browse the repository at this point in the history
* clean up doc and code nitting

Signed-off-by: youliang <[email protected]>

* fix links

Signed-off-by: youliang <[email protected]>

---------

Signed-off-by: youliang <[email protected]>
  • Loading branch information
youliangtan authored Feb 14, 2024
1 parent 8ada212 commit 6ed3efb
Show file tree
Hide file tree
Showing 20 changed files with 200 additions and 258 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,15 +64,15 @@ SERL provides a set of common libraries for users to train RL policies for robot

| Code Directory | Description |
| --- | --- |
| [serl_launcher](./serl_launcher) | Main code for SERL |
| [serl_launcher.agents](./serl_launcher/serl_launcher/agents/) | Agent Policies (e.g. DRQ, SAC, BC) |
| [serl_launcher.wrappers](./serl_launcher/serl_launcher/wrappers) | Gym env wrappers |
| [serl_launcher.data](./serl_launcher/serl_launcher/data) | Replay buffer and data store |
| [serl_launcher.vision](./serl_launcher/serl_launcher/vision) | Vision related models and utils |
| [serl_launcher](https://github.com/rail-berkeley/serl/blob/main/serl_launcher) | Main code for SERL |
| [serl_launcher.agents](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/agents/) | Agent Policies (e.g. DRQ, SAC, BC) |
| [serl_launcher.wrappers](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/wrappers) | Gym env wrappers |
| [serl_launcher.data](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/data) | Replay buffer and data store |
| [serl_launcher.vision](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/vision) | Vision related models and utils |
| [franka_sim](./franka_sim) | Franka mujoco simulation gym environment |
| [serl_robot_infra](./serl_robot_infra/) | Robot infra for running with real robots |
| [serl_robot_infra.robot_servers](./serl_robot_infra/robot_servers/) | Flask server for sending commands to robot via ROS |
| [serl_robot_infra.franka_env](./serl_robot_infra/franka_env/) | Gym env for real franka robot |
| [serl_robot_infra.robot_servers](https://github.com/rail-berkeley/serl/blob/main/serl_robot_infra/robot_servers/) | Flask server for sending commands to robot via ROS |
| [serl_robot_infra.franka_env](https://github.com/rail-berkeley/serl/blob/main/serl_robot_infra/franka_env/) | Gym env for real franka robot |

## Quick Start with SERL in Sim

Expand Down
12 changes: 9 additions & 3 deletions docs/real_franka.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,14 @@ When running with a real robot, a separate gym env is needed. For our examples,

![](./images/robot_infra_interfaces.png)

Follow the [README](serl_robot_infra/README.md) in `serl_robot_infra` for installation and basic robot operation instructions.

### Installation for `serl_robot_infra`

*NOTE: The following code will not run as it is, since it will require custom data, checkpoints, and robot env. We provide the code as a reference for how to use SERL with real robots. Learn this section in incremental order, starting from the first task (peg insertion) to the last task (bin relocation). Modify the code according to your needs. *
Follow the [README](../serl_robot_infra/README.md) in `serl_robot_infra` for installation and basic robot operation instructions. This contains the instruction for installing the impendence-based [serl_franka_controllers](https://github.com/rail-berkeley/serl_franka_controllers).

After the installation, you should be able to run the robot server, interact with the gym `franka_env` (hardware).

> NOTE: The following example code will not run as it is, since it will require custom data, checkpoints, and robot env. We provide the code as a reference for how to use SERL with real robots. Learn this section in incremental order, starting from the first task (peg insertion) to the last task (bin relocation). Modify the code according to your needs.
## 1. Peg Insertion 📍

Expand Down Expand Up @@ -80,7 +84,9 @@ env = RecordEpisodeStatistics(env) # record episode statistics
> Env and default config are located in `serl_robot_infra/franka_env/envs/pcb_env/`
Similar to peg insertion, here we record demo trajectories with the robot, then run the learner and actor nodes.
Similar to peg insertion task, we define the reward in this task is given by checking whether the end-effector pose matches a fixed target pose. Update the `TARGET_POSE` in [peg_env/config.py](../serl_robot_infra/franka_env/envs/peg_env/config.py) with the measured end-effector pose.
Here we record demo trajectories with the robot, then run the learner and actor nodes.
```bash
# record demo trajectories
python record_demo.py
Expand Down
12 changes: 3 additions & 9 deletions examples/async_bin_relocation_fwbw_drq/async_drq_randomized.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from serl_launcher.wrappers.chunking import ChunkingWrapper
from serl_launcher.utils.train_utils import concat_batches

from agentlace.trainer import TrainerServer, TrainerClient, TrainerTunnel
from agentlace.trainer import TrainerServer, TrainerClient
from agentlace.data.data_store import QueuedDataStore

from serl_launcher.utils.launcher import (
Expand Down Expand Up @@ -121,11 +121,9 @@ def actor(
data_stores: OrderedDict[str, MemoryEfficientReplayBufferDataStore],
env,
sampling_rng,
tunnel=None,
):
"""
This is the actor loop, which runs when "--actor" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
if FLAGS.eval_checkpoint_step:
for task in agents.keys():
Expand Down Expand Up @@ -295,12 +293,9 @@ def update_params_bw(params):
##############################################################################


def learner(
rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None, tunnel=None
):
def learner(rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None):
"""
The learner loop, which runs when "--learner" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
# To track the step in the training loop
update_steps = 0
Expand Down Expand Up @@ -542,7 +537,6 @@ def create_replay_buffer_and_wandb_logger():
replay_buffer,
demo_buffer=demo_buffer,
wandb_logger=wandb_logger,
tunnel=None,
)

elif FLAGS.actor:
Expand All @@ -552,7 +546,7 @@ def create_replay_buffer_and_wandb_logger():
)
# actor loop
print_green("starting actor loop")
actor(agents, data_stores, env, sampling_rng, tunnel=None)
actor(agents, data_stores, env, sampling_rng)

else:
raise NotImplementedError("Must be either a learner or an actor")
Expand Down
13 changes: 4 additions & 9 deletions examples/async_cable_route_drq/async_drq_randomized.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from serl_launcher.wrappers.chunking import ChunkingWrapper
from serl_launcher.utils.train_utils import concat_batches

from agentlace.trainer import TrainerServer, TrainerClient, TrainerTunnel
from agentlace.trainer import TrainerServer, TrainerClient
from agentlace.data.data_store import QueuedDataStore

from serl_launcher.data.data_store import MemoryEfficientReplayBufferDataStore
Expand Down Expand Up @@ -94,10 +94,9 @@ def print_green(x):
##############################################################################


def actor(agent: DrQAgent, data_store, env, sampling_rng, tunnel=None):
def actor(agent: DrQAgent, data_store, env, sampling_rng):
"""
This is the actor loop, which runs when "--actor" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
if FLAGS.eval_checkpoint_step:
success_counter = 0
Expand Down Expand Up @@ -219,12 +218,9 @@ def update_params(params):
##############################################################################


def learner(
rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None, tunnel=None
):
def learner(rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None):
"""
The learner loop, which runs when "--learner" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
# To track the step in the training loop
update_steps = 0
Expand Down Expand Up @@ -410,15 +406,14 @@ def create_replay_buffer_and_wandb_logger():
replay_buffer,
demo_buffer=demo_buffer,
wandb_logger=wandb_logger,
tunnel=None,
)

elif FLAGS.actor:
sampling_rng = jax.device_put(sampling_rng, sharding.replicate())
data_store = QueuedDataStore(50000) # the queue size on the actor
# actor loop
print_green("starting actor loop")
actor(agent, data_store, env, sampling_rng, tunnel=None)
actor(agent, data_store, env, sampling_rng)

else:
raise NotImplementedError("Must be either a learner or an actor")
Expand Down
9 changes: 3 additions & 6 deletions examples/async_drq_sim/async_drq_sim.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,10 +78,9 @@ def print_green(x):
##############################################################################


def actor(agent: DrQAgent, data_store, env, sampling_rng, tunnel=None):
def actor(agent: DrQAgent, data_store, env, sampling_rng):
"""
This is the actor loop, which runs when "--actor" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
client = TrainerClient(
"actor_env",
Expand Down Expand Up @@ -171,10 +170,9 @@ def update_params(params):
##############################################################################


def learner(rng, agent: DrQAgent, replay_buffer, wandb_logger=None, tunnel=None):
def learner(rng, agent: DrQAgent, replay_buffer, wandb_logger=None):
"""
The learner loop, which runs when "--learner" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
# To track the step in the training loop
update_steps = 0
Expand Down Expand Up @@ -318,7 +316,6 @@ def create_replay_buffer_and_wandb_logger():
agent,
replay_buffer,
wandb_logger=wandb_logger,
tunnel=None,
)

elif FLAGS.actor:
Expand All @@ -327,7 +324,7 @@ def create_replay_buffer_and_wandb_logger():

# actor loop
print_green("starting actor loop")
actor(agent, data_store, env, sampling_rng, tunnel=None)
actor(agent, data_store, env, sampling_rng)

else:
raise NotImplementedError("Must be either a learner or an actor")
Expand Down
11 changes: 3 additions & 8 deletions examples/async_pcb_insert_drq/async_drq_randomized.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,10 +91,9 @@ def print_green(x):
##############################################################################


def actor(agent: DrQAgent, data_store, env, sampling_rng, tunnel=None):
def actor(agent: DrQAgent, data_store, env, sampling_rng):
"""
This is the actor loop, which runs when "--actor" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
if FLAGS.eval_checkpoint_step:
success_counter = 0
Expand Down Expand Up @@ -214,12 +213,9 @@ def update_params(params):
##############################################################################


def learner(
rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None, tunnel=None
):
def learner(rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None):
"""
The learner loop, which runs when "--learner" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
# To track the step in the training loop
update_steps = 0
Expand Down Expand Up @@ -391,7 +387,6 @@ def create_replay_buffer_and_wandb_logger():
replay_buffer,
demo_buffer=demo_buffer,
wandb_logger=wandb_logger,
tunnel=None,
)

elif FLAGS.actor:
Expand All @@ -400,7 +395,7 @@ def create_replay_buffer_and_wandb_logger():

# actor loop
print_green("starting actor loop")
actor(agent, data_store, env, sampling_rng, tunnel=None)
actor(agent, data_store, env, sampling_rng)

else:
raise NotImplementedError("Must be either a learner or an actor")
Expand Down
11 changes: 3 additions & 8 deletions examples/async_peg_insert_drq/async_drq_randomized.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,9 @@ def print_green(x):
##############################################################################


def actor(agent: DrQAgent, data_store, env, sampling_rng, tunnel=None):
def actor(agent: DrQAgent, data_store, env, sampling_rng):
"""
This is the actor loop, which runs when "--actor" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
if FLAGS.eval_checkpoint_step:
success_counter = 0
Expand Down Expand Up @@ -213,12 +212,9 @@ def update_params(params):
##############################################################################


def learner(
rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None, tunnel=None
):
def learner(rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None):
"""
The learner loop, which runs when "--learner" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
# To track the step in the training loop
update_steps = 0
Expand Down Expand Up @@ -390,7 +386,6 @@ def create_replay_buffer_and_wandb_logger():
replay_buffer,
demo_buffer=demo_buffer,
wandb_logger=wandb_logger,
tunnel=None,
)

elif FLAGS.actor:
Expand All @@ -399,7 +394,7 @@ def create_replay_buffer_and_wandb_logger():

# actor loop
print_green("starting actor loop")
actor(agent, data_store, env, sampling_rng, tunnel=None)
actor(agent, data_store, env, sampling_rng)

else:
raise NotImplementedError("Must be either a learner or an actor")
Expand Down
8 changes: 2 additions & 6 deletions examples/async_rlpd_drq_sim/async_rlpd_drq_sim.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,9 @@ def print_green(x):
##############################################################################


def actor(agent: DrQAgent, data_store, env, sampling_rng, tunnel=None):
def actor(agent: DrQAgent, data_store, env, sampling_rng):
"""
This is the actor loop, which runs when "--actor" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
client = TrainerClient(
"actor_env",
Expand Down Expand Up @@ -179,11 +178,9 @@ def learner(
replay_buffer,
demo_buffer,
wandb_logger=None,
tunnel=None,
):
"""
The learner loop, which runs when "--learner" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
# To track the step in the training loop
update_steps = 0
Expand Down Expand Up @@ -355,7 +352,6 @@ def create_replay_buffer_and_wandb_logger():
replay_buffer,
demo_buffer=demo_buffer,
wandb_logger=wandb_logger,
tunnel=None,
)

elif FLAGS.actor:
Expand All @@ -364,7 +360,7 @@ def create_replay_buffer_and_wandb_logger():

# actor loop
print_green("starting actor loop")
actor(agent, data_store, env, sampling_rng, tunnel=None)
actor(agent, data_store, env, sampling_rng)

else:
raise NotImplementedError("Must be either a learner or an actor")
Expand Down
11 changes: 3 additions & 8 deletions examples/async_sac_state_sim/async_sac_state_sim.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,9 @@ def print_green(x):
##############################################################################


def actor(agent: SACAgent, data_store, env, sampling_rng, tunnel=None):
def actor(agent: SACAgent, data_store, env, sampling_rng):
"""
This is the actor loop, which runs when "--actor" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
client = TrainerClient(
"actor_env",
Expand Down Expand Up @@ -166,12 +165,9 @@ def update_params(params):
##############################################################################


def learner(
rng, agent: SACAgent, replay_buffer, replay_iterator, wandb_logger=None, tunnel=None
):
def learner(rng, agent: SACAgent, replay_buffer, replay_iterator, wandb_logger=None):
"""
The learner loop, which runs when "--learner" is set to True.
NOTE: tunnel is used the transport layer for multi-threading
"""
# To track the step in the training loop
update_steps = 0
Expand Down Expand Up @@ -299,7 +295,6 @@ def create_replay_buffer_and_wandb_logger():
replay_buffer,
replay_iterator=replay_iterator,
wandb_logger=wandb_logger,
tunnel=None,
)

elif FLAGS.actor:
Expand All @@ -308,7 +303,7 @@ def create_replay_buffer_and_wandb_logger():

# actor loop
print_green("starting actor loop")
actor(agent, data_store, env, sampling_rng, tunnel=None)
actor(agent, data_store, env, sampling_rng)

else:
raise NotImplementedError("Must be either a learner or an actor")
Expand Down
9 changes: 8 additions & 1 deletion serl_robot_infra/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ There is a Flask server which sends commands to the robot via ROS. There is a gy
conda activate serl
pip install -e .
```

### Usage

**Robot Server**
Expand All @@ -29,7 +30,13 @@ From there you should be able to navigate to `serl_robot_infra` and then simply

```bash
conda activate serl
python serl_robo_infra/robot_servers/franka_server.py --gripper_type=<Robotiq|Franka|None> --robot_ip=<robot_IP> --gripper_ip=<[Optional] Robotiq_gripper_IP>
# script to start http server and ros controller
python serl_robo_infra/robot_servers/franka_server.py \
--gripper_type=<Robotiq|Franka|None>
--robot_ip=<robot_IP>
--gripper_ip=<[Optional] Robotiq_gripper_IP>
--reset_joint_target=<[Optional] robot_joints_when_robot_resets>
```

This should start ROS node impedence controller and the HTTP server. You can test that things are running by trying to move the end effector around, if the impedence controller is running it should be compliant.
Expand Down
Loading

0 comments on commit 6ed3efb

Please sign in to comment.