LeRobot documentation
ACT (Action Chunking with Transformers)
ACT (Action Chunking with Transformers)
ACT is a lightweight and efficient policy for imitation learning, especially well-suited for fine-grained manipulation tasks. It’s the first model we recommend when you’re starting out with LeRobot due to its fast training time, low computational requirements, and strong performance.
Watch this tutorial from the LeRobot team to learn how ACT works: LeRobot ACT Tutorial
Model Overview
Action Chunking with Transformers (ACT) was introduced in the paper Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware by Zhao et al. The policy was designed to enable precise, contact-rich manipulation tasks using affordable hardware and minimal demonstration data.
Why ACT is Great for Beginners
ACT stands out as an excellent starting point for several reasons:
- Fast Training: Trains in a few hours on a single GPU
- Lightweight: Only ~80M parameters, making it efficient and easy to work with
- Data Efficient: Often achieves high success rates with just 50 demonstrations
Architecture
ACT uses a transformer-based architecture with three main components:
- Vision Backbone: ResNet-18 processes images from multiple camera viewpoints
- Transformer Encoder: Synthesizes information from camera features, joint positions, and a learned latent variable
- Transformer Decoder: Generates coherent action sequences using cross-attention
The policy takes as input:
- Multiple RGB images (e.g., from wrist cameras, front/top cameras)
- Current robot joint positions
- A latent style variable
z
(learned during training, set to zero during inference)
And outputs a chunk of k
future action sequences.
Installation Requirements
- Install LeRobot by following our Installation Guide.
- ACT is included in the base LeRobot installation, so no additional dependencies are needed!
Training ACT
ACT works seamlessly with the standard LeRobot training pipeline. Here’s a complete example for training ACT on your dataset:
lerobot-train \
--dataset.repo_id=${HF_USER}/your_dataset \
--policy.type=act \
--output_dir=outputs/train/act_your_dataset \
--job_name=act_your_dataset \
--policy.device=cuda \
--wandb.enable=true \
--policy.repo_id=${HF_USER}/act_policy
Training Tips
- Start with defaults: ACT’s default hyperparameters work well for most tasks
- Training duration: Expect a few hours for 100k training steps on a single GPU
- Batch size: Start with batch size 8 and adjust based on your GPU memory
Train using Google Colab
If your local computer doesn’t have a powerful GPU, you can utilize Google Colab to train your model by following the ACT training notebook.
Evaluating ACT
Once training is complete, you can evaluate your ACT policy using the lerobot-record
command with your trained policy. This will run inference and record evaluation episodes:
lerobot-record \
--robot.type=so100_follower \
--robot.port=/dev/ttyACM0 \
--robot.id=my_robot \
--robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
--display_data=true \
--dataset.repo_id=${HF_USER}/eval_act_your_dataset \
--dataset.num_episodes=10 \
--dataset.single_task="Your task description" \
--policy.path=${HF_USER}/act_policy