Poor success rate in complex scenarios #545

mydhui · 2024-12-04T06:20:31Z

Hi I used Moss robot to play with and train ACT policy, when it comes to one lego piece, it can finish grabbing task at high success rate after recording 50+ episodes with different pose & location variants, but generalization on multi-piece random location is not promising.

When I started to add complexity (for example 6 pieces with different colors like the picture below), and place the lego pieces a little bit randomly, record one episode continuously until all the pieces are grabbed (other than 1 piece 1 episode). furthermore, were recorded with order

Here is what I found:

The trained policy can not work if the gripping sequence is randomized, in other words it has to keep a fixed spacial order e.g. from upper left to down right.
The trained policy can not work if the [location, color, pose] combination was not seen in training dataset, especially location combos
At first I suspected only iPhone and Mac fixed cameras can not give enough depth perception, so I bought a wide-angle USB camera mounted it on the gripper, as a result success rate didn't get higher.
Enlarging dataset size to 120+ episodes didn't give obvious change.

I was wondering how to improve this task, is the method I used to record data wrong or due to the generalization of ACT is limited?

Looking forward to hearing answers or experience

astroyat · 2024-12-05T02:32:06Z

Have you tried picking only one piece out of 6 pieces for one episode?

mydhui · 2024-12-05T04:04:42Z

@astroyat My original thought is that reseting to the start position would be time consuming, so I haven't tried this method, do you have experience about this, in your experiment did 1 piece 1 episode help improve success rate?

I think the essential problem is whether this algorithm can learn planning in a long-time picking scenario

astroyat · 2024-12-05T15:52:05Z

I found the success rate is better when the joints graph is consistent for all the episodes even for random location.
Here is my recording with one camera, starting from top position instead of rest position and the tip is modded to be wider.
https://huggingface.co/spaces/lerobot/visualize_dataset?dataset=astroyat%2Fcube&episode=0
Here is my eval with high success rate, the eval graph matches the recording graph.
https://huggingface.co/spaces/lerobot/visualize_dataset?dataset=astroyat%2Feval_cube&episode=0
I tried pick all 5 legos in one episode and the joint graph is chaotic, so, the eval was very bad.
For long-time multi task, use the multiple ACT method here https://github.com/1g0rrr/SimpleAutomation

mydhui · 2024-12-06T06:47:21Z

I found the success rate is better when the joints graph is consistent for all the episodes even for random location. Here is my recording with one camera, starting from top position instead of rest position and the tip is modded to be wider. https://huggingface.co/spaces/lerobot/visualize_dataset?dataset=astroyat%2Fcube&episode=0 Here is my eval with high success rate, the eval graph matches the recording graph. https://huggingface.co/spaces/lerobot/visualize_dataset?dataset=astroyat%2Feval_cube&episode=0 I tried pick all 5 legos in one episode and the joint graph is chaotic, so, the eval was very bad. For long-time multi task, use the multiple ACT method here https://github.com/1g0rrr/SimpleAutomation

I see, thank you so much for the reply, I tried to record picking the six legos 1 piece per episode, the training loss converged faster and loss was lower than long-time task.

But disappointingly, the behavior is similar to the previous one, when the lego grasping order is learned in dataset it performs good, but when the state is hard to define order, ACT starts to collapse. let me give an example, I was recording data in this order, left-right and up-down, and followed a dummy block rule (subjectively determined) indicated by the black dotted line, the number is the order I grasp legos.

The upper block prioritized over below ones, and within block, left pieces prioritized over right ones.

When the lego piece is at the junction of the blocks, robot arm will stuck somewhere in the middle (e.g. in the middle of the blue and the green like below). if not intervening by relocating the state to somewhere model has seen, it will never get autocorrect.

I think one reason is even humans cannot maintain consistent and coherent actions in those situations. Does this mean the sequence of unordered grasping is very difficult to learn in the current algorithm framework? Any idea or comment?

mydhui changed the title ~~Poor success rate on complex scenarios~~ Poor success rate in complex scenarios Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor success rate in complex scenarios #545

Poor success rate in complex scenarios #545

mydhui commented Dec 4, 2024

astroyat commented Dec 5, 2024

mydhui commented Dec 5, 2024

astroyat commented Dec 5, 2024

mydhui commented Dec 6, 2024 •

edited

Loading

Poor success rate in complex scenarios #545

Poor success rate in complex scenarios #545

Comments

mydhui commented Dec 4, 2024

astroyat commented Dec 5, 2024

mydhui commented Dec 5, 2024

astroyat commented Dec 5, 2024

mydhui commented Dec 6, 2024 • edited Loading

mydhui commented Dec 6, 2024 •

edited

Loading