ACT Cutlery Arrangement — chunk_size=50 (act_local_v2)

Training context

Trained: Mon May 25 01:10:00 PM PST 2026
Dataset: AI-Final/AI-aiCapstoneData-lerobot-cutlery-v2 (v2)
- 200 scenes with equitable spatial distribution (vs 76 in v1)
- 127 successful demos at datagen time (~64% success rate)
State machine: corrected (fork → +x of plate, knife → -x). The previous v1 dataset (SchindleriaPraematurus/aiCapstoneData-lerobot-cutlery) was generated with a sign-inverted success criterion — policies trained on it learned the opposite of what the leaderboard tests.

Architecture & hyperparameters


Policy	ACT (Action Chunking Transformer)
Steps	80,000
Batch size	8
chunk_size	50
Save frequency	every 20k steps
Hardware	NVIDIA GeForce RTX 3060 (12 GB)

Experiment design — chunk_size A/B

Two policies were trained as a clean A/B test. The leaderboard eval calls the policy with --policy_action_horizon=1 (re-plan every step), so we want to know whether shorter or longer training chunks produce better single-step actions.

Sibling repo	chunk_size
AI-Final/cutlery-act-v2-chunk100	100 (lerobot default)
AI-Final/cutlery-act-v2-chunk50	50 (shorter chunks)

Eval

7-episode rollout on eval/cutlery_arrangement_eval.py (the task file that mirrors the leaderboard).

Result line: [Evaluation] Final success rate: 0.000 [0/7]
Video: eval_rollout_7_episodes.mp4 (side-by-side wrist + front cameras)

Known ceiling

Even at perfect imitation, the underlying state machine has:

~33% baseline failure on non-tight-pair scenes (fixed-phase timing / IK)
25 s worst-case completion vs 20 s leaderboard cap
9 cm release height — cutlery bounces

So a successful run here demonstrates the imitation, but doesn't break past the FSM's intrinsic limitations.

Downloads last month: 30

Safetensors

Model size

51.6M params

Tensor type

F32

Video Preview

Robotics