Instructions to use JHeisler/aloha_solo_left_act_removal_reversed_13k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use JHeisler/aloha_solo_left_act_removal_reversed_13k with LeRobot:
- Notebooks
- Google Colab
- Kaggle
ACT β ALOHA Single-Arm (Left) β Mask REMOVAL via Reversed Data β 13.4k steps
Action Chunking Transformer (ACT) policy for mask removal trained on a synthetic dataset derived by time-reversing the placement dataset. Each placement episode (gripper opens, releasing mask onto face) becomes a removal episode (gripper closes, picking mask off face) when reversed. Action sequences are reversed AND shifted by 1 step so each action targets the correct next state.
This is the 13.4k-step baseline (S005). For the deeper 40k retrain, see JHeisler/aloha_solo_left_act_removal_reversed_40k.
Training Config
| Field | Value |
|---|---|
| Architecture | ACT (ResNet18 backbone + 4-layer Transformer encoder + VAE chunking head) β identical to placement model |
| Dataset | JHeisler/aloha_solo_left_4_6_26_reversed β 50 ep, 29,735 samples, 30 fps, time-reversed with 1-step action shift |
| State / action dim | 9 / 9 |
| Cameras | cam_high, cam_left_wrist (3Γ480Γ640 each) |
| Steps | 13,400 |
| Batch size | 48 |
| Learning rate | 6e-5 (linear warmup 500 β cosine) |
| Total samples seen | |
| AMP | enabled |
| torch.compile | enabled |
| Final loss | 0.035β0.037 |
| Final grad norm | 0.5β0.8 |
| Wall clock | 2h 3min on RTX A4500 (matches placement S001 to the minute) |
| LeRobot pin | 96c7052777aca85d4e55dfba8f81586103ba8f61 |
Project Lineage
| Workstream | Task | Steps | Final loss | HF |
|---|---|---|---|---|
| S001 | placement | 13,400 | 0.029 | act_left |
| S005 | removal (reversed) | 13,400 | 0.035 | this repo |
| S003 | placement (shipped) | 40,000 | 0.015 | act_left_40k |
| S006 | removal (reversed) | 40,000 | 0.018 | act_removal_reversed_40k |
S001 vs S005 isolates the dataset variable (forward placement vs reversed) at identical architecture and step count.
Caveats
- Synthetic data. Trained on time-reversed placement, not native removal. Visual transitions are physically backwards (mask appears on face, arm pulls it off). Doesn't affect ACT's per-timestep predictions, but a policy trained on real removal data will likely outperform.
- The "approach the table from above" segment of placement reversed becomes "place above table" β useful end state but not a removal-specific motion.
- Use as a lower-bound baseline until native removal data is available.
Usage
from lerobot.common.policies.act.modeling_act import ACTPolicy
policy = ACTPolicy.from_pretrained("JHeisler/aloha_solo_left_act_removal_reversed_13k")
Citation / Course
EN.525.681 school project β JHU Whiting School of Engineering. Team: Jake Heisler, Laura Kroening, Purushottam Shukla.
Code reference: HuggingFace LeRobot at commit 96c7052.
- Downloads last month
- 39