ACT — ALOHA Single-Arm (Left) — Mask REMOVAL via Reversed Data — 13.4k steps

Action Chunking Transformer (ACT) policy for mask removal trained on a synthetic dataset derived by time-reversing the placement dataset. Each placement episode (gripper opens, releasing mask onto face) becomes a removal episode (gripper closes, picking mask off face) when reversed. Action sequences are reversed AND shifted by 1 step so each action targets the correct next state.

This is the 13.4k-step baseline (S005). For the deeper 40k retrain, see JHeisler/aloha_solo_left_act_removal_reversed_40k.

Training Config

Field	Value
Architecture	ACT (ResNet18 backbone + 4-layer Transformer encoder + VAE chunking head) — identical to placement model
Dataset	JHeisler/aloha_solo_left_4_6_26_reversed — 50 ep, 29,735 samples, 30 fps, time-reversed with 1-step action shift
State / action dim	9 / 9
Cameras	`cam_high`, `cam_left_wrist` (3×480×640 each)
Steps	13,400
Batch size	48
Learning rate	6e-5 (linear warmup 500 → cosine)
Total samples seen	~~643K (~~21 epochs)
AMP	enabled
torch.compile	enabled
Final loss	0.035–0.037
Final grad norm	0.5–0.8
Wall clock	2h 3min on RTX A4500 (matches placement S001 to the minute)
LeRobot pin	`96c7052777aca85d4e55dfba8f81586103ba8f61`

Project Lineage

Workstream	Task	Steps	Final loss	HF
S001	placement	13,400	0.029	act_left
S005	removal (reversed)	13,400	0.035	this repo
S003	placement (shipped)	40,000	0.015	act_left_40k
S006	removal (reversed)	40,000	0.018	act_removal_reversed_40k

S001 vs S005 isolates the dataset variable (forward placement vs reversed) at identical architecture and step count.

Caveats

Synthetic data. Trained on time-reversed placement, not native removal. Visual transitions are physically backwards (mask appears on face, arm pulls it off). Doesn't affect ACT's per-timestep predictions, but a policy trained on real removal data will likely outperform.
The "approach the table from above" segment of placement reversed becomes "place above table" — useful end state but not a removal-specific motion.
Use as a lower-bound baseline until native removal data is available.

Usage

from lerobot.common.policies.act.modeling_act import ACTPolicy
policy = ACTPolicy.from_pretrained("JHeisler/aloha_solo_left_act_removal_reversed_13k")

Citation / Course

EN.525.681 school project — JHU Whiting School of Engineering. Team: Jake Heisler, Laura Kroening, Purushottam Shukla.

Code reference: HuggingFace LeRobot at commit 96c7052.

Downloads last month: 39

Video Preview

Robotics

JHeisler
/

aloha_solo_left_act_removal_reversed_13k