ACT β€” ALOHA Single-Arm (Left) β€” Mask REMOVAL via Reversed Data β€” 13.4k steps

Action Chunking Transformer (ACT) policy for mask removal trained on a synthetic dataset derived by time-reversing the placement dataset. Each placement episode (gripper opens, releasing mask onto face) becomes a removal episode (gripper closes, picking mask off face) when reversed. Action sequences are reversed AND shifted by 1 step so each action targets the correct next state.

This is the 13.4k-step baseline (S005). For the deeper 40k retrain, see JHeisler/aloha_solo_left_act_removal_reversed_40k.

Training Config

Field Value
Architecture ACT (ResNet18 backbone + 4-layer Transformer encoder + VAE chunking head) β€” identical to placement model
Dataset JHeisler/aloha_solo_left_4_6_26_reversed β€” 50 ep, 29,735 samples, 30 fps, time-reversed with 1-step action shift
State / action dim 9 / 9
Cameras cam_high, cam_left_wrist (3Γ—480Γ—640 each)
Steps 13,400
Batch size 48
Learning rate 6e-5 (linear warmup 500 β†’ cosine)
Total samples seen 643K (21 epochs)
AMP enabled
torch.compile enabled
Final loss 0.035–0.037
Final grad norm 0.5–0.8
Wall clock 2h 3min on RTX A4500 (matches placement S001 to the minute)
LeRobot pin 96c7052777aca85d4e55dfba8f81586103ba8f61

Project Lineage

Workstream Task Steps Final loss HF
S001 placement 13,400 0.029 act_left
S005 removal (reversed) 13,400 0.035 this repo
S003 placement (shipped) 40,000 0.015 act_left_40k
S006 removal (reversed) 40,000 0.018 act_removal_reversed_40k

S001 vs S005 isolates the dataset variable (forward placement vs reversed) at identical architecture and step count.

Caveats

  • Synthetic data. Trained on time-reversed placement, not native removal. Visual transitions are physically backwards (mask appears on face, arm pulls it off). Doesn't affect ACT's per-timestep predictions, but a policy trained on real removal data will likely outperform.
  • The "approach the table from above" segment of placement reversed becomes "place above table" β€” useful end state but not a removal-specific motion.
  • Use as a lower-bound baseline until native removal data is available.

Usage

from lerobot.common.policies.act.modeling_act import ACTPolicy
policy = ACTPolicy.from_pretrained("JHeisler/aloha_solo_left_act_removal_reversed_13k")

Citation / Course

EN.525.681 school project β€” JHU Whiting School of Engineering. Team: Jake Heisler, Laura Kroening, Purushottam Shukla.

Code reference: HuggingFace LeRobot at commit 96c7052.

Downloads last month
39
Video Preview
loading

Dataset used to train JHeisler/aloha_solo_left_act_removal_reversed_13k