Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor success rate in complex scenarios #545

Open
mydhui opened this issue Dec 4, 2024 · 4 comments
Open

Poor success rate in complex scenarios #545

mydhui opened this issue Dec 4, 2024 · 4 comments

Comments

@mydhui
Copy link

mydhui commented Dec 4, 2024

Hi I used Moss robot to play with and train ACT policy, when it comes to one lego piece, it can finish grabbing task at high success rate after recording 50+ episodes with different pose & location variants, but generalization on multi-piece random location is not promising.

When I started to add complexity (for example 6 pieces with different colors like the picture below), and place the lego pieces a little bit randomly, record one episode continuously until all the pieces are grabbed (other than 1 piece 1 episode). furthermore, were recorded with order
IMG_4681 HEIC

Here is what I found:

  1. The trained policy can not work if the gripping sequence is randomized, in other words it has to keep a fixed spacial order e.g. from upper left to down right.

  2. The trained policy can not work if the [location, color, pose] combination was not seen in training dataset, especially location combos

  3. At first I suspected only iPhone and Mac fixed cameras can not give enough depth perception, so I bought a wide-angle USB camera mounted it on the gripper, as a result success rate didn't get higher.
    20241204141608

  4. Enlarging dataset size to 120+ episodes didn't give obvious change.

I was wondering how to improve this task, is the method I used to record data wrong or due to the generalization of ACT is limited?

Looking forward to hearing answers or experience

@mydhui mydhui changed the title Poor success rate on complex scenarios Poor success rate in complex scenarios Dec 4, 2024
@astroyat
Copy link

astroyat commented Dec 5, 2024

Have you tried picking only one piece out of 6 pieces for one episode?

@mydhui
Copy link
Author

mydhui commented Dec 5, 2024

@astroyat My original thought is that reseting to the start position would be time consuming, so I haven't tried this method, do you have experience about this, in your experiment did 1 piece 1 episode help improve success rate?

I think the essential problem is whether this algorithm can learn planning in a long-time picking scenario

@astroyat
Copy link

astroyat commented Dec 5, 2024

I found the success rate is better when the joints graph is consistent for all the episodes even for random location.
Here is my recording with one camera, starting from top position instead of rest position and the tip is modded to be wider.
https://huggingface.co/spaces/lerobot/visualize_dataset?dataset=astroyat%2Fcube&episode=0
Here is my eval with high success rate, the eval graph matches the recording graph.
https://huggingface.co/spaces/lerobot/visualize_dataset?dataset=astroyat%2Feval_cube&episode=0
I tried pick all 5 legos in one episode and the joint graph is chaotic, so, the eval was very bad.
For long-time multi task, use the multiple ACT method here https://github.com/1g0rrr/SimpleAutomation

@mydhui
Copy link
Author

mydhui commented Dec 6, 2024

I found the success rate is better when the joints graph is consistent for all the episodes even for random location. Here is my recording with one camera, starting from top position instead of rest position and the tip is modded to be wider. https://huggingface.co/spaces/lerobot/visualize_dataset?dataset=astroyat%2Fcube&episode=0 Here is my eval with high success rate, the eval graph matches the recording graph. https://huggingface.co/spaces/lerobot/visualize_dataset?dataset=astroyat%2Feval_cube&episode=0 I tried pick all 5 legos in one episode and the joint graph is chaotic, so, the eval was very bad. For long-time multi task, use the multiple ACT method here https://github.com/1g0rrr/SimpleAutomation

I see, thank you so much for the reply, I tried to record picking the six legos 1 piece per episode, the training loss converged faster and loss was lower than long-time task.

But disappointingly, the behavior is similar to the previous one, when the lego grasping order is learned in dataset it performs good, but when the state is hard to define order, ACT starts to collapse. let me give an example, I was recording data in this order, left-right and up-down, and followed a dummy block rule (subjectively determined) indicated by the black dotted line, the number is the order I grasp legos.
image

The upper block prioritized over below ones, and within block, left pieces prioritized over right ones.

When the lego piece is at the junction of the blocks, robot arm will stuck somewhere in the middle (e.g. in the middle of the blue and the green like below). if not intervening by relocating the state to somewhere model has seen, it will never get autocorrect.
image

I think one reason is even humans cannot maintain consistent and coherent actions in those situations. Does this mean the sequence of unordered grasping is very difficult to learn in the current algorithm framework? Any idea or comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants