-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delta indices simplification #791
base: main
Are you sure you want to change the base?
Delta indices simplification #791
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best code is no code.
This looks good, thanks! Left a few minor comments.
@Cadene I know you had some opinion about keeping the spec in delta_timestamps rather than delta_indices. I think the simplification here is worth it, besides right now we don't really have any use case or workflow where delta_timestamps is necessary over having delta_indices, AFAIK.
On a side note, I'm curious about how we should handle unsynced features like cameras with different fps, or variable refresh rate sensors. This is not directly an issue with using indices over timestamps but related nonetheless. Happy to get your opinion about this @IliaLarchenko @Cadene
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
for more information, see https://pre-commit.ci
Co-authored-by: Simon Alibert <[email protected]>
We still have a main dataset level fps and timestamps for each index and we check that all of them are valid and inside tolerance bounds: lerobot/lerobot/common/datasets/lerobot_dataset.py Lines 505 to 509 in 8b2effa
We do the same check when we save an episode: lerobot/lerobot/common/datasets/lerobot_dataset.py Lines 883 to 889 in 8b2effa
Then for actually querying video frames we use timestamps: lerobot/lerobot/common/datasets/lerobot_dataset.py Lines 731 to 735 in 8b2effa
And when we actually decode video we check it again: lerobot/lerobot/common/datasets/video_utils.py Lines 104 to 114 in 8b2effa
|
Thanks for your work. Indeed our code can be improved and simplified. However, I didnt understand your argument for changing the interface from timestamps to indices. I agree it simplifies things downstream, but for me it is at the expense of expressivity. Maybe the solution would be to simplify the code downstream, instead of upstream. For instance, the policy should not have indices defined, but timestamps. Here is my argument for keeping timestamps as input: it is more expressive than indices. In this example, if we were to use indices, even with this comment, we have a hard time understanding what -75 , -50, -25, etc. mean. Instead we have to map these indices into timestamps, because it's more meaningful for us.
It's also important to reason in timestamps, because at 10 fps, an index of -10 corresponds to -1 second, but at 50 fps, it's - 20ms which doesnt correspond at all to the same temporal learning for your model. By forcing the interface of dataset to be in timestamps, it's easier to not make this mistakes. I think there are some issues in our API, but the best way to take the correct decisions is to make progress on other features such as training on multiple datasets with various fps. It will automatically raise some issues regarding timestamps and indices. As of now, I would prefer to not make this change yet. What do you think? |
I don’t have a strong preference for indices over timestamps. Currently, timestamps are only used in examples with a manual training loop. In practice, if you train a model with any policy, you specify indices, and inside the dataset, you use indices to query items. Regardless of the approach, you need to account for FPS when specifying either "observation.state": [dt * fps for dt in [-1.5, -1, -0.5, -0.20, -0.10, 0]] For observations, timestamps feel more natural, while for actions, indices make more sense. For example, if you want to predict the next 2 seconds of actions, you have to account for FPS anyway:
So ultimately, they’re not that different, but my approach skips a couple of intermediate steps. For multi-dataset scenarios, this gets even trickier. If one dataset has 10 FPS and another has 50 FPS, it's unclear what the best way to handle them is. My initial goal was to remove unnecessary conversions and allow policies to define more flexible class RandomizedDeltaIndices:
def __iter__(self):
return iter([
random.randint(-25, -15),
random.randint(-15, -5),
*range(-4, 1)
]) However, this approach can be a bit messy. An alternative is passing a |
A thought: For delta timestamps given as input to a dataset which don't match its fps, we can do some interpolation for state and action, and returns the closest frames for the vision modalities. Thus, in the multi-dataset case, we can keep the same delta timestamps for both. Then for training your DOT policy with some random delta timestamps, we can provide a "data augmentation" callable function to the dataset. In any case, I think the delta timestamps interface to LeRobotDataset is to be prefered over delta indices. |
Got it. So, I can close this PR. And later when DOT policy is integrated I can try to come up with some nice solutions for delta_indices augmentaitons. |
What this does
This PR removes unnecessary back-and-forth conversion between delta_indices and delta_timestamps.
datasets/factory.py
we transform them into the delta timestamps by dividing them by fps:lerobot_dataset.py
we useget_delta_indices
fromdataset/utils.py
to transform them back by multiplying them by fps:In the end, we use delta_indices, which we set up at the very beginning.
Basically we define delta_indices -> transform them to delta_timestapms -> trainsform them back to delta_indices
All of this happens before we query data from the dataset. When the actual query happens we generate
query_indices
fromdelta_indices
and alsoquery_timestamps
fromdelta_indices
and use them for the actual data query.I think that the transformation
delta_indices
->delta_timestapms
->delta_indices
is completely unnecessary. Because we definedelta_indices
in the policy and usedelta_indices
inside the dataset.I removed all unnecessary transformations, all tests related to functions that were doing these transformations, and fixed examples.
Removing these steps doesn't have much impact on performance but lets us clean up the code. And also opens up some other opportunities. As we directly use
delta_indices
defined in the policy we can use something more flexible than a simple list.How it was tested
This PR doesn't break anything that is using
train.py
all policies work fine without any changes. All configs already usedelta_indices
so don't need to change anything.But it can break someone's custom training pipeline because LerobotDataset expects
delta_indices
instead ofdelta_timestapms
parameter now but it is very easy to fix as you can always getdelta_indices
by multiplyingdelta_timestapms
by fps.How to checkout & try? (for the reviewer)
You can try to train any policy, eg: