JHeisler commited on
Commit
5c7761b
Β·
verified Β·
1 Parent(s): 58135f5

Update model card with accurate training metadata, lineage, and usage

Browse files
Files changed (1) hide show
  1. README.md +78 -6
README.md CHANGED
@@ -1,14 +1,86 @@
1
  ---
2
  library_name: lerobot
 
 
3
  tags:
4
  - act
5
  - diffusion
6
- - model_hub_mixin
7
- - pytorch_model_hub_mixin
8
  - robotics
 
 
 
 
 
 
 
9
  ---
10
 
11
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
12
- - Code: https://github.com/huggingface/lerobot
13
- - Paper: [More Information Needed]
14
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: lerobot
3
+ license: apache-2.0
4
+ pipeline_tag: robotics
5
  tags:
6
  - act
7
  - diffusion
 
 
8
  - robotics
9
+ - imitation-learning
10
+ - behavior-cloning
11
+ - aloha
12
+ - pytorch_model_hub_mixin
13
+ - model_hub_mixin
14
+ datasets:
15
+ - JHeisler/aloha_solo_left_4_6_26
16
  ---
17
 
18
+ # Hybrid ACT+Diffusion β€” ALOHA Single-Arm (Left) β€” 13.4k steps
19
+
20
+ Custom **HybridACTDiffusion** policy: ACT visual encoder (ResNet18 + 4-layer Transformer, mean-pooled) feeds a Diffusion U-Net decoder (FiLM conditioning, DDPM training, DDIM 10-step inference). No VAE β€” diffusion handles multimodal action distributions directly.
21
+
22
+ This is the **initial 13.4k-step Hybrid baseline (S002)**. For the longer 40k retrain, see [JHeisler/aloha_solo_left_act_diffusion_40k](https://huggingface.co/JHeisler/aloha_solo_left_act_diffusion_40k).
23
+
24
+ ## Architecture
25
+
26
+ ```
27
+ Images (cam_high, cam_left_wrist) + State (dim=9)
28
+ β”‚
29
+ β–Ό
30
+ ACT Encoder (ResNet18 β†’ 4-layer Transformer) β†’ mean-pool β†’ (B, 512) global cond vector
31
+ β”‚
32
+ β–Ό
33
+ Diffusion U-Net (DiffusionConditionalUnet1d, FiLM modulation, down_dims=(256,512))
34
+ β”‚ DDPM training / DDIM 10-step inference
35
+ β–Ό
36
+ Action chunks (chunk_size=100, action_dim=9)
37
+ ```
38
+
39
+ ## Training Config
40
+
41
+ | Field | Value |
42
+ |---|---|
43
+ | Architecture | HybridACTDiffusion (ACT encoder + Diffusion U-Net) β€” see `lerobot/common/policies/hybrid_act_diffusion/` |
44
+ | Dataset | [JHeisler/aloha_solo_left_4_6_26](https://huggingface.co/datasets/JHeisler/aloha_solo_left_4_6_26) β€” 50 episodes, 29,785 samples, 30 fps |
45
+ | State / action dim | 9 / 9 |
46
+ | Cameras | `cam_high`, `cam_left_wrist` (3Γ—480Γ—640 each) |
47
+ | Steps | 13,400 |
48
+ | Batch size | 24 (DOE winner) |
49
+ | Learning rate | 3e-5 |
50
+ | Total samples seen | ~321K (~10.6 epochs) |
51
+ | AMP | enabled |
52
+ | torch.compile | enabled |
53
+ | Diffusion scheduler | DDPM training (100 timesteps, squaredcos_cap_v2), DDIM at inference (10 steps) |
54
+ | Final loss (DDPM noise-pred MSE) | 0.011–0.020 |
55
+ | Final grad norm | 0.2–0.7 |
56
+ | Wall clock | ~1h 16min on RTX A4500 |
57
+ | LeRobot pin | `96c7052777aca85d4e55dfba8f81586103ba8f61` (with custom hybrid_act_diffusion policy added) |
58
+
59
+ ## Project Lineage
60
+
61
+ | Workstream | Model | Steps | Samples | HF |
62
+ |---|---|---|---|---|
63
+ | S001 | ACT | 13,400 | 640K | [act_left](https://huggingface.co/JHeisler/aloha_solo_left_4_6_26_act_left) |
64
+ | **S002** | **Hybrid ACT+Diffusion** | **13,400** | **321K** | **this repo** |
65
+ | S003 | ACT (shipped) | 40,000 | 1.92M | [act_left_40k](https://huggingface.co/JHeisler/aloha_solo_left_4_6_26_act_left_40k) |
66
+ | S004 | Hybrid ACT+Diffusion | 40,000 | 1.12M | [act_diffusion_40k](https://huggingface.co/JHeisler/aloha_solo_left_act_diffusion_40k) |
67
+
68
+ ## Notes on loss comparability
69
+
70
+ DDPM noise-prediction MSE (this model) and ACT's L1+KL combo (S001/S003) are different loss surfaces β€” absolute loss values are NOT directly comparable across architectures. The right comparison is offline action L1 on held-out episodes or real-robot rollout success rate.
71
+
72
+ ## Usage
73
+
74
+ The custom policy class lives in this project's LeRobot fork. To use:
75
+
76
+ ```python
77
+ # Requires lerobot pinned to 96c7052 with hybrid_act_diffusion policy package added
78
+ from lerobot.common.policies.hybrid_act_diffusion.modeling_hybrid_act_diffusion import HybridACTDiffusionPolicy
79
+ policy = HybridACTDiffusionPolicy.from_pretrained("JHeisler/aloha_solo_left_act_diffusion")
80
+ ```
81
+
82
+ ## Citation / Course
83
+
84
+ EN.525.681 school project β€” JHU Whiting School of Engineering. Team: Jake Heisler, Laura Kroening, Purushottam Shukla.
85
+
86
+ Code reference: [HuggingFace LeRobot](https://github.com/huggingface/lerobot) at commit `96c7052` with custom hybrid policy package.