Port HIL-SERL #565

michel-aractingi · 2024-12-09T20:33:55Z

What this does

Adds HIL-SERL to the policies of LeRobot in lerobot/common/policies/hilserl/.

What this PR contains so far

The ability to assign binary rewards during recording datasets in lerobot/scripts/control_robot.py -> record function.
Reward classifier,
- Code to define and train a reward classifier model to detect successful tasks in lerobot/common/policies/hilserl/classifier.
- Script to train the reward classifier lerobot/scripts/train_hilserl_classifier.py
Rollout on the real robot and human intervention, in script lerobot/scripts/eval_on_robot.py we added the ability to do policy rollouts on the real robot. Moreover, you also have the ability to stop the policy actions bieng rolled-out and take over if you have a leader arm.

How to test:

Annotate episodes with reward during recordings:

python lerobot/scripts/control_robot.py record \
    --robot-path lerobot/configs/robot/moss.yaml \
    --fps 30 \
    --root data \
    --repo-id ${HF_USER}/moss_test \
    --tags moss tutorial \
    --warmup-time-s 5 \
    --episode-time-s 40 \
    --reset-time-s 10 \
    --num-episodes 2 \
    --push-to-hub 1 \
    --assign_rewards 1

Train reward classifier:

python lerobot/scripts/train_hilserl_classifier.py --config-name policy/reward_classifier.yaml

Run an example of eval on the robot and test human interventions

python lerobot/scripts/eval_on_robot.py --robot-path lerobot/configs/robot/koch.yaml

References:

HIL-SERL implementation https://github.com/rail-berkeley/hil-serl/tree/main
reward assignment PR Reward assignment during recording #518 and classifier Reward classifier and training #528 by @ChorntonYoel
human Interventions Add human intervention mechanism and eval_robot script to evaluate policy on the robot #541

lerobot/scripts/train_hilserl_classifier.py

Co-authored-by: Daniel Ritchie <[email protected]> Co-authored-by: resolver101757 <[email protected]> Co-authored-by: Jannik Grothusen <[email protected]> Co-authored-by: Remi <[email protected]> Co-authored-by: Michel Aractingi <[email protected]>

…licy on the robot (#541) Co-authored-by: Yoel <[email protected]>

Co-authored-by: Simon Alibert <[email protected]>

Co-authored-by: Remi <[email protected]>

Co-authored-by: Yoel <[email protected]>

Co-authored-by: KeWang1017 <[email protected]>

…ing logic - Added `num_subsample_critics`, `critic_target_update_weight`, and `utd_ratio` to SACConfig. - Implemented target entropy calculation in SACPolicy if not provided. - Introduced subsampling of critics to prevent overfitting during updates. - Updated temperature loss calculation to use the new target entropy. - Added comments for future UTD update implementation. These changes improve the flexibility and performance of the SAC implementation.

mydhui · 2024-12-18T12:09:57Z

@michel-aractingi Hi, Can you elaborate more on how to test hil-serl ?

During step 1, for instance a cube grasping task, should we record failure samples on purpose or reward transition from 0 to 1 after successful grasping is enough ?

What should be the expected behavior in step 3 ("eval on the robot and test human interventions"), would this algorithm perform better than ACT ?

Thanks.

…s & check script (#578)

…n handling - Updated action selection to use distribution sampling and log probabilities for better stochastic behavior. - Enhanced standard deviation clamping to prevent extreme values, ensuring stability in policy outputs. - Cleaned up code by removing unnecessary comments and improving readability. These changes aim to refine the SAC implementation, enhancing its robustness and performance during training and inference.

- Updated standard deviation parameterization in SACConfig to 'softplus' with defined min and max values for improved stability. - Modified action sampling in SACPolicy to use reparameterized sampling, ensuring better gradient flow and log probability calculations. - Cleaned up log probability calculations in TanhMultivariateNormalDiag for clarity and efficiency. - Increased evaluation frequency in YAML configuration to 50000 for more efficient training cycles. These changes aim to enhance the robustness and performance of the SAC implementation during training and inference.

…d stability - Updated SACConfig to replace standard deviation parameterization with log_std_min and log_std_max for better control over action distributions. - Modified SACPolicy to streamline action selection and log probability calculations, enhancing stochastic behavior. - Removed deprecated TanhMultivariateNormalDiag class to simplify the codebase and improve maintainability. These changes aim to enhance the robustness and performance of the SAC implementation during training and inference.

Cadene requested review from aliberts and Cadene December 9, 2024 20:48

ChorntonYoel reviewed Dec 10, 2024

View reviewed changes

lerobot/scripts/train_hilserl_classifier.py Outdated Show resolved Hide resolved

michel-aractingi and others added 13 commits December 17, 2024 02:39

nit

1aa8d4a

Reward classifier and training (#528)

e35546f

Co-authored-by: Daniel Ritchie <[email protected]> Co-authored-by: resolver101757 <[email protected]> Co-authored-by: Jannik Grothusen <[email protected]> Co-authored-by: Remi <[email protected]> Co-authored-by: Michel Aractingi <[email protected]>

Add human intervention mechanism and eval_robot script to evaluate po…

7fcf638

…licy on the robot (#541) Co-authored-by: Yoel <[email protected]>

Fixup

1020bc3

Refactor OpenX (#505)

c3bc136

Fix missing local_files_only in record/replay (#540)

1612e00

Co-authored-by: Simon Alibert <[email protected]>

Control simulated robot with real leader (#514)

b19fef9

Co-authored-by: Remi <[email protected]>

Update 7_get_started_with_real_robot.md (#559)

4b0c88f

LerobotDataset pushable to HF from any folder (#563)

67f4d7e

Update lerobot/scripts/train_hilserl_classifier.py

668d493

Co-authored-by: Yoel <[email protected]>

nit in control_robot.py

ed66c92

completed losses

c9af8e3

Port SAC WIP (#581)

def42ff

Co-authored-by: KeWang1017 <[email protected]>

michel-aractingi force-pushed the user/michel-aractingi/2024-11-27-port-hil-serl branch from 3d7e74d to def42ff Compare December 17, 2024 15:22

KeWang1017 and others added 2 commits December 17, 2024 17:58

added comments from kewang

7b68bfb

helper2424 and others added 10 commits December 23, 2024 10:43

[Port Hil-SERL] Add unit tests for the reward classifier & fix import…

70b652f

…s & check script (#578)

[HIL-SERL PORT] Fix linter issues (#588)

b53d6e0

added optimizer and sac to factory.py

08ec971

Added normalization schemes and style checks

dc54d35

trying to get sac running

18a4598

style fixes

bae3b02

split encoder for critic and actor

ee306e2

added temporary fix for missing task_index key in online environment

35de91e

michel-aractingi force-pushed the user/michel-aractingi/2024-11-27-port-hil-serl branch from bd8d252 to 35de91e Compare December 30, 2024 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port HIL-SERL #565

Port HIL-SERL #565

michel-aractingi commented Dec 9, 2024 •

edited

Loading

mydhui commented Dec 18, 2024

Port HIL-SERL #565

Are you sure you want to change the base?

Port HIL-SERL #565

Conversation

michel-aractingi commented Dec 9, 2024 • edited Loading

What this does

What this PR contains so far

How to test:

mydhui commented Dec 18, 2024

michel-aractingi commented Dec 9, 2024 •

edited

Loading