Code Example: Datasets, in practice

This section shows you how to work with robotics datasets from Hugging Face using the LeRobotDataset class. We’ll start with simple examples and gradually add complexity, so you can copy and adapt the approach that best fits your project.

The key thing to understand is that any dataset on the Hub that follows LeRobot’s format (with tabular data, visual data, and metadata included) can be loaded with just one line of code.

When working with robotics data, you often need to look at multiple time steps at once rather than single data points. Why? Most robot learning algorithms need to see how things change over time. For example, to pick up an object, a robot might need to see what happened in the last few moments to understand the current situation better. Similarly, many algorithms work better when they can plan several actions ahead rather than just deciding what to do right now.

LeRobotDataset makes this easy with “temporal windowing.” You simply declare which time offsets you want (i.e. current frame plus the two previous ones), and it automatically handles the complexity of getting those frames, even when some might be missing at the beginning or end of an episode.

streaming-multiple-frames

Temporal Windows Explained:

Observation history: [-0.2, -0.1, 0.0] gives you 200ms, 100ms, and current observations

Action sequences: [0.0, 0.1, 0.2] provides current and next 2 actions (100ms apart)

Automatic padding: Missing frames at episode boundaries are handled automatically. The datasets always returns the requested number of frames, and it applies padding where necessary.

Mask included: Know which frames are real vs. padded for proper training

Conveniently, by using LeRobotDataset with a PyTorch DataLoader one can automatically collate the individual sample dictionaries from the dataset into a single dictionary of batched tensors for downstream training or inference. LeRobotDataset also natively supports streaming mode for datasets. Users can stream data of a large dataset hosted on the Hugging Face Hub, with a one-line change in their implementation. Streaming datasets supports high-performance batch processing (ca. 80-100 it/s, varying on connectivity) and high levels of frames randomization, key features for practical BC algorithms which otherwise may be slow or operating on highly non-i.i.d. data. This feature is designed to improve on accessibility so that large datasets can be processed by users without requiring large amounts of memory and storage.

Here are different ways to set up temporal windows depending on your use case. Skim the options and pick one to start—switching later is just a change to the dictionary.

basic-bc

history-bc

action-chunking

Streaming Large Datasets

When to use streaming:

Dataset > available storage - Stream datasets that don’t fit on your disk

Experimentation - Quickly try different datasets without downloading

Cloud training - Reduce startup time by streaming from Hugging Face Hub

Network available - Requires stable internet connection during training

Performance: Streaming achieves 80-100 it/s with good connectivity! That is (on average) comparable with locally-stored datasets, factoring out initialization overhead.

download

streaming

Training Integration

You can easily integrate regular and streaming datasets with torch data loaders. This makes integrating any LeRobotDataset with your own (torch) training loop rather convenient. Because we fetch all frames from the datasets as a tensor, wrapping iterating over a dataset with training is particularly straightforward.

PyTorch DataLoader

import torch
from torch.utils.data import DataLoader
# Create DataLoader for training
dataloader = DataLoader(
    dataset,
    batch_size=16,
    shuffle=True,
    num_workers=4
)

# Training loop
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

for batch in dataloader:
    # Move to device
    observations = batch["observation.state"].to(device)
    actions = batch["action"].to(device)
    images = batch["observation.images.wrist_camera"].to(device)
    
    # Your model training here
    # loss = model(observations, images, actions)
    # loss.backward()
    # optimizer.step()

Why This Matters

This simple API hides significant complexity:

✅ Multi-modal synchronization - Images and sensors perfectly aligned
✅ Efficient storage - Compressed videos, memory-mapped arrays
✅ Temporal handling - Easy access to observation/action sequences
✅ Scalability - Same code works for small and massive datasets

Compare this to traditional robotics data handling, which often requires:

Custom parsers for each data format
Manual synchronization across modalities
Complex buffering for temporal windows
Platform-specific loading code

LeRobotDataset standardizes and simplifies all of this!

Section Quiz

Test your understanding of LeRobot and its role in robot learning:

1. What makes LeRobot different from traditional robotics libraries?

2. Which of the following is NOT a key component of LeRobot’s approach?

3. What is the main advantage of LeRobot’s optimized inference stack?

4. Which types of robotic platforms does LeRobot support?

5. What does “end-to-end integration with the robotics stack” mean in the context of LeRobot?

6. What is the primary purpose of the delta_timestamps parameter in LeRobotDataset?

7. Which of the following best describes the three main components of LeRobotDataset?

8. What happens when you use StreamingLeRobotDataset instead of LeRobotDataset ?

9. In the context of robot learning, what does “temporal windowing” refer to?

10. What is the main advantage of LeRobotDataset’s approach to storing video data?

11. Which statement about LeRobotDataset’s compatibility is correct?

References

For a full list of references, check out the tutorial.

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion (2024)
Cheng Chi et al.
This paper introduces diffusion models for robot policy learning and discusses how temporal windowing and action chunking enable smooth visuomotor control.
arXiv:2303.04137
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (2023)
Anthony Brohan et al.
Demonstrates how vision-language models can be fine-tuned for robotic control, including discussion of temporal context windows and action prediction horizons.
arXiv:2307.15818

Update on GitHub

Robotics Course

Code Example: Datasets, in practice

Streaming Large Datasets

Training Integration

PyTorch DataLoader

Why This Matters

Section Quiz

1. What makes LeRobot different from traditional robotics libraries?

2. Which of the following is NOT a key component of LeRobot’s approach?

3. What is the main advantage of LeRobot’s optimized inference stack?

4. Which types of robotic platforms does LeRobot support?

5. What does “end-to-end integration with the robotics stack” mean in the context of LeRobot?

6. What is the primary purpose of the delta_timestamps parameter in LeRobotDataset?

7. Which of the following best describes the three main components of LeRobotDataset?

8. What happens when you use StreamingLeRobotDataset instead of LeRobotDataset ?

9. In the context of robot learning, what does “temporal windowing” refer to?

10. What is the main advantage of LeRobotDataset’s approach to storing video data?

11. Which statement about LeRobotDataset’s compatibility is correct?

References