Doc and code cleanup (#25)

* clean up doc and code nitting Signed-off-by: youliang <[email protected]> * fix links Signed-off-by: youliang <[email protected]> --------- Signed-off-by: youliang <[email protected]>
rail-berkeley · Feb 14, 2024 · 6ed3efb · 6ed3efb
1 parent 8ada212
commit 6ed3efb
Show file tree

Hide file tree

Showing 20 changed files with 200 additions and 258 deletions.
diff --git a/README.md b/README.md
@@ -64,15 +64,15 @@ SERL provides a set of common libraries for users to train RL policies for robot
 
 | Code Directory | Description |
 | --- | --- |
-| [serl_launcher](./serl_launcher) | Main code for SERL |
-| [serl_launcher.agents](./serl_launcher/serl_launcher/agents/) | Agent Policies (e.g. DRQ, SAC, BC) |
-| [serl_launcher.wrappers](./serl_launcher/serl_launcher/wrappers) | Gym env wrappers |
-| [serl_launcher.data](./serl_launcher/serl_launcher/data) | Replay buffer and data store |
-| [serl_launcher.vision](./serl_launcher/serl_launcher/vision) | Vision related models and utils |
+| [serl_launcher](https://github.com/rail-berkeley/serl/blob/main/serl_launcher) | Main code for SERL |
+| [serl_launcher.agents](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/agents/) | Agent Policies (e.g. DRQ, SAC, BC) |
+| [serl_launcher.wrappers](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/wrappers) | Gym env wrappers |
+| [serl_launcher.data](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/data) | Replay buffer and data store |
+| [serl_launcher.vision](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/vision) | Vision related models and utils |
 | [franka_sim](./franka_sim) | Franka mujoco simulation gym environment |
 | [serl_robot_infra](./serl_robot_infra/) | Robot infra for running with real robots |
-| [serl_robot_infra.robot_servers](./serl_robot_infra/robot_servers/) | Flask server for sending commands to robot via ROS |
-| [serl_robot_infra.franka_env](./serl_robot_infra/franka_env/) | Gym env for real franka robot |
+| [serl_robot_infra.robot_servers](https://github.com/rail-berkeley/serl/blob/main/serl_robot_infra/robot_servers/) | Flask server for sending commands to robot via ROS |
+| [serl_robot_infra.franka_env](https://github.com/rail-berkeley/serl/blob/main/serl_robot_infra/franka_env/) | Gym env for real franka robot |
 
 ## Quick Start with SERL in Sim
 

diff --git a/docs/real_franka.md b/docs/real_franka.md
@@ -6,10 +6,14 @@ When running with a real robot, a separate gym env is needed. For our examples,
 
 ![](./images/robot_infra_interfaces.png)
 
-Follow the [README](serl_robot_infra/README.md) in `serl_robot_infra` for installation and basic robot operation instructions.
 
+### Installation for `serl_robot_infra`
 
-*NOTE: The following code will not run as it is, since it will require custom data, checkpoints, and robot env. We provide the code as a reference for how to use SERL with real robots. Learn this section in incremental order, starting from the first task (peg insertion) to the last task (bin relocation). Modify the code according to your needs. *
+Follow the [README](../serl_robot_infra/README.md) in `serl_robot_infra` for installation and basic robot operation instructions. This contains the instruction for installing the impendence-based [serl_franka_controllers](https://github.com/rail-berkeley/serl_franka_controllers).
+
+After the installation, you should be able to run the robot server, interact with the gym `franka_env` (hardware).
+
+> NOTE: The following example code will not run as it is, since it will require custom data, checkpoints, and robot env. We provide the code as a reference for how to use SERL with real robots. Learn this section in incremental order, starting from the first task (peg insertion) to the last task (bin relocation). Modify the code according to your needs.
 
 ## 1. Peg Insertion 📍
 
@@ -80,7 +84,9 @@ env = RecordEpisodeStatistics(env) # record episode statistics
 
 > Env and default config are located in `serl_robot_infra/franka_env/envs/pcb_env/`
 
-Similar to peg insertion, here we record demo trajectories with the robot, then run the learner and actor nodes.
+Similar to peg insertion task, we define the reward in this task is given by checking whether the end-effector pose matches a fixed target pose. Update the `TARGET_POSE` in [peg_env/config.py](../serl_robot_infra/franka_env/envs/peg_env/config.py) with the measured end-effector pose.
+
+Here we record demo trajectories with the robot, then run the learner and actor nodes.
 ```bash
 # record demo trajectories
 python record_demo.py

diff --git a/examples/async_bin_relocation_fwbw_drq/async_drq_randomized.py b/examples/async_bin_relocation_fwbw_drq/async_drq_randomized.py
@@ -20,7 +20,7 @@
 from serl_launcher.wrappers.chunking import ChunkingWrapper
 from serl_launcher.utils.train_utils import concat_batches
 
-from agentlace.trainer import TrainerServer, TrainerClient, TrainerTunnel
+from agentlace.trainer import TrainerServer, TrainerClient
 from agentlace.data.data_store import QueuedDataStore
 
 from serl_launcher.utils.launcher import (
@@ -121,11 +121,9 @@ def actor(
     data_stores: OrderedDict[str, MemoryEfficientReplayBufferDataStore],
     env,
     sampling_rng,
-    tunnel=None,
 ):
     """
     This is the actor loop, which runs when "--actor" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     if FLAGS.eval_checkpoint_step:
         for task in agents.keys():
@@ -295,12 +293,9 @@ def update_params_bw(params):
 ##############################################################################
 
 
-def learner(
-    rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None, tunnel=None
-):
+def learner(rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None):
     """
     The learner loop, which runs when "--learner" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     # To track the step in the training loop
     update_steps = 0
@@ -542,7 +537,6 @@ def create_replay_buffer_and_wandb_logger():
             replay_buffer,
             demo_buffer=demo_buffer,
             wandb_logger=wandb_logger,
-            tunnel=None,
         )
 
     elif FLAGS.actor:
@@ -552,7 +546,7 @@ def create_replay_buffer_and_wandb_logger():
         )
         # actor loop
         print_green("starting actor loop")
-        actor(agents, data_stores, env, sampling_rng, tunnel=None)
+        actor(agents, data_stores, env, sampling_rng)
 
     else:
         raise NotImplementedError("Must be either a learner or an actor")

diff --git a/examples/async_cable_route_drq/async_drq_randomized.py b/examples/async_cable_route_drq/async_drq_randomized.py
@@ -18,7 +18,7 @@
 from serl_launcher.wrappers.chunking import ChunkingWrapper
 from serl_launcher.utils.train_utils import concat_batches
 
-from agentlace.trainer import TrainerServer, TrainerClient, TrainerTunnel
+from agentlace.trainer import TrainerServer, TrainerClient
 from agentlace.data.data_store import QueuedDataStore
 
 from serl_launcher.data.data_store import MemoryEfficientReplayBufferDataStore
@@ -94,10 +94,9 @@ def print_green(x):
 ##############################################################################
 
 
-def actor(agent: DrQAgent, data_store, env, sampling_rng, tunnel=None):
+def actor(agent: DrQAgent, data_store, env, sampling_rng):
     """
     This is the actor loop, which runs when "--actor" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     if FLAGS.eval_checkpoint_step:
         success_counter = 0
@@ -219,12 +218,9 @@ def update_params(params):
 ##############################################################################
 
 
-def learner(
-    rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None, tunnel=None
-):
+def learner(rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None):
     """
     The learner loop, which runs when "--learner" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     # To track the step in the training loop
     update_steps = 0
@@ -410,15 +406,14 @@ def create_replay_buffer_and_wandb_logger():
             replay_buffer,
             demo_buffer=demo_buffer,
             wandb_logger=wandb_logger,
-            tunnel=None,
         )
 
     elif FLAGS.actor:
         sampling_rng = jax.device_put(sampling_rng, sharding.replicate())
         data_store = QueuedDataStore(50000)  # the queue size on the actor
         # actor loop
         print_green("starting actor loop")
-        actor(agent, data_store, env, sampling_rng, tunnel=None)
+        actor(agent, data_store, env, sampling_rng)
 
     else:
         raise NotImplementedError("Must be either a learner or an actor")

diff --git a/examples/async_drq_sim/async_drq_sim.py b/examples/async_drq_sim/async_drq_sim.py
@@ -78,10 +78,9 @@ def print_green(x):
 ##############################################################################
 
 
-def actor(agent: DrQAgent, data_store, env, sampling_rng, tunnel=None):
+def actor(agent: DrQAgent, data_store, env, sampling_rng):
     """
     This is the actor loop, which runs when "--actor" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     client = TrainerClient(
         "actor_env",
@@ -171,10 +170,9 @@ def update_params(params):
 ##############################################################################
 
 
-def learner(rng, agent: DrQAgent, replay_buffer, wandb_logger=None, tunnel=None):
+def learner(rng, agent: DrQAgent, replay_buffer, wandb_logger=None):
     """
     The learner loop, which runs when "--learner" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     # To track the step in the training loop
     update_steps = 0
@@ -318,7 +316,6 @@ def create_replay_buffer_and_wandb_logger():
             agent,
             replay_buffer,
             wandb_logger=wandb_logger,
-            tunnel=None,
         )
 
     elif FLAGS.actor:
@@ -327,7 +324,7 @@ def create_replay_buffer_and_wandb_logger():
 
         # actor loop
         print_green("starting actor loop")
-        actor(agent, data_store, env, sampling_rng, tunnel=None)
+        actor(agent, data_store, env, sampling_rng)
 
     else:
         raise NotImplementedError("Must be either a learner or an actor")

diff --git a/examples/async_pcb_insert_drq/async_drq_randomized.py b/examples/async_pcb_insert_drq/async_drq_randomized.py
@@ -91,10 +91,9 @@ def print_green(x):
 ##############################################################################
 
 
-def actor(agent: DrQAgent, data_store, env, sampling_rng, tunnel=None):
+def actor(agent: DrQAgent, data_store, env, sampling_rng):
     """
     This is the actor loop, which runs when "--actor" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     if FLAGS.eval_checkpoint_step:
         success_counter = 0
@@ -214,12 +213,9 @@ def update_params(params):
 ##############################################################################
 
 
-def learner(
-    rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None, tunnel=None
-):
+def learner(rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None):
     """
     The learner loop, which runs when "--learner" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     # To track the step in the training loop
     update_steps = 0
@@ -391,7 +387,6 @@ def create_replay_buffer_and_wandb_logger():
             replay_buffer,
             demo_buffer=demo_buffer,
             wandb_logger=wandb_logger,
-            tunnel=None,
         )
 
     elif FLAGS.actor:
@@ -400,7 +395,7 @@ def create_replay_buffer_and_wandb_logger():
 
         # actor loop
         print_green("starting actor loop")
-        actor(agent, data_store, env, sampling_rng, tunnel=None)
+        actor(agent, data_store, env, sampling_rng)
 
     else:
         raise NotImplementedError("Must be either a learner or an actor")

diff --git a/examples/async_peg_insert_drq/async_drq_randomized.py b/examples/async_peg_insert_drq/async_drq_randomized.py
@@ -90,10 +90,9 @@ def print_green(x):
 ##############################################################################
 
 
-def actor(agent: DrQAgent, data_store, env, sampling_rng, tunnel=None):
+def actor(agent: DrQAgent, data_store, env, sampling_rng):
     """
     This is the actor loop, which runs when "--actor" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     if FLAGS.eval_checkpoint_step:
         success_counter = 0
@@ -213,12 +212,9 @@ def update_params(params):
 ##############################################################################
 
 
-def learner(
-    rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None, tunnel=None
-):
+def learner(rng, agent: DrQAgent, replay_buffer, demo_buffer, wandb_logger=None):
     """
     The learner loop, which runs when "--learner" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     # To track the step in the training loop
     update_steps = 0
@@ -390,7 +386,6 @@ def create_replay_buffer_and_wandb_logger():
             replay_buffer,
             demo_buffer=demo_buffer,
             wandb_logger=wandb_logger,
-            tunnel=None,
         )
 
     elif FLAGS.actor:
@@ -399,7 +394,7 @@ def create_replay_buffer_and_wandb_logger():
 
         # actor loop
         print_green("starting actor loop")
-        actor(agent, data_store, env, sampling_rng, tunnel=None)
+        actor(agent, data_store, env, sampling_rng)
 
     else:
         raise NotImplementedError("Must be either a learner or an actor")

diff --git a/examples/async_rlpd_drq_sim/async_rlpd_drq_sim.py b/examples/async_rlpd_drq_sim/async_rlpd_drq_sim.py
@@ -80,10 +80,9 @@ def print_green(x):
 ##############################################################################
 
 
-def actor(agent: DrQAgent, data_store, env, sampling_rng, tunnel=None):
+def actor(agent: DrQAgent, data_store, env, sampling_rng):
     """
     This is the actor loop, which runs when "--actor" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     client = TrainerClient(
         "actor_env",
@@ -179,11 +178,9 @@ def learner(
     replay_buffer,
     demo_buffer,
     wandb_logger=None,
-    tunnel=None,
 ):
     """
     The learner loop, which runs when "--learner" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     # To track the step in the training loop
     update_steps = 0
@@ -355,7 +352,6 @@ def create_replay_buffer_and_wandb_logger():
             replay_buffer,
             demo_buffer=demo_buffer,
             wandb_logger=wandb_logger,
-            tunnel=None,
         )
 
     elif FLAGS.actor:
@@ -364,7 +360,7 @@ def create_replay_buffer_and_wandb_logger():
 
         # actor loop
         print_green("starting actor loop")
-        actor(agent, data_store, env, sampling_rng, tunnel=None)
+        actor(agent, data_store, env, sampling_rng)
 
     else:
         raise NotImplementedError("Must be either a learner or an actor")

diff --git a/examples/async_sac_state_sim/async_sac_state_sim.py b/examples/async_sac_state_sim/async_sac_state_sim.py
@@ -69,10 +69,9 @@ def print_green(x):
 ##############################################################################
 
 
-def actor(agent: SACAgent, data_store, env, sampling_rng, tunnel=None):
+def actor(agent: SACAgent, data_store, env, sampling_rng):
     """
     This is the actor loop, which runs when "--actor" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     client = TrainerClient(
         "actor_env",
@@ -166,12 +165,9 @@ def update_params(params):
 ##############################################################################
 
 
-def learner(
-    rng, agent: SACAgent, replay_buffer, replay_iterator, wandb_logger=None, tunnel=None
-):
+def learner(rng, agent: SACAgent, replay_buffer, replay_iterator, wandb_logger=None):
     """
     The learner loop, which runs when "--learner" is set to True.
-    NOTE: tunnel is used the transport layer for multi-threading
     """
     # To track the step in the training loop
     update_steps = 0
@@ -299,7 +295,6 @@ def create_replay_buffer_and_wandb_logger():
             replay_buffer,
             replay_iterator=replay_iterator,
             wandb_logger=wandb_logger,
-            tunnel=None,
         )
 
     elif FLAGS.actor:
@@ -308,7 +303,7 @@ def create_replay_buffer_and_wandb_logger():
 
         # actor loop
         print_green("starting actor loop")
-        actor(agent, data_store, env, sampling_rng, tunnel=None)
+        actor(agent, data_store, env, sampling_rng)
 
     else:
         raise NotImplementedError("Must be either a learner or an actor")

diff --git a/serl_robot_infra/README.md b/serl_robot_infra/README.md
@@ -19,6 +19,7 @@ There is a Flask server which sends commands to the robot via ROS. There is a gy
     conda activate serl
     pip install -e .
     ```
+
 ### Usage
 
 **Robot Server**
@@ -29,7 +30,13 @@ From there you should be able to navigate to `serl_robot_infra` and then simply
 
 ```bash
 conda activate serl
-python serl_robo_infra/robot_servers/franka_server.py --gripper_type=<Robotiq|Franka|None> --robot_ip=<robot_IP> --gripper_ip=<[Optional] Robotiq_gripper_IP>
+
+# script to start http server and ros controller
+python serl_robo_infra/robot_servers/franka_server.py \
+    --gripper_type=<Robotiq|Franka|None>
+    --robot_ip=<robot_IP>
+    --gripper_ip=<[Optional] Robotiq_gripper_IP>
+    --reset_joint_target=<[Optional] robot_joints_when_robot_resets>
 ```
 
 This should start ROS node impedence controller and the HTTP server. You can test that things are running by trying to move the end effector around, if the impedence controller is running it should be compliant.