JHeisler
/

aloha_solo_left_act_diffusion

@@ -1,14 +1,86 @@
 ---
 library_name: lerobot
 tags:
 - act
 - diffusion
-- model_hub_mixin
-- pytorch_model_hub_mixin
 - robotics
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: https://github.com/huggingface/lerobot
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

 ---
 library_name: lerobot
+license: apache-2.0
+pipeline_tag: robotics
 tags:
 - act
 - diffusion
 - robotics
+- imitation-learning
+- behavior-cloning
+- aloha
+- pytorch_model_hub_mixin
+- model_hub_mixin
+datasets:
+- JHeisler/aloha_solo_left_4_6_26
 ---
+# Hybrid ACT+Diffusion — ALOHA Single-Arm (Left) — 13.4k steps
+Custom **HybridACTDiffusion** policy: ACT visual encoder (ResNet18 + 4-layer Transformer, mean-pooled) feeds a Diffusion U-Net decoder (FiLM conditioning, DDPM training, DDIM 10-step inference). No VAE — diffusion handles multimodal action distributions directly.
+This is the **initial 13.4k-step Hybrid baseline (S002)**. For the longer 40k retrain, see [JHeisler/aloha_solo_left_act_diffusion_40k](https://huggingface.co/JHeisler/aloha_solo_left_act_diffusion_40k).
+## Architecture
+```
+Images (cam_high, cam_left_wrist) + State (dim=9)
+     │
+     ▼
+ACT Encoder (ResNet18 → 4-layer Transformer) → mean-pool → (B, 512) global cond vector
+     │
+     ▼
+Diffusion U-Net (DiffusionConditionalUnet1d, FiLM modulation, down_dims=(256,512))
+     │  DDPM training / DDIM 10-step inference
+     ▼
+Action chunks (chunk_size=100, action_dim=9)
+```
+## Training Config
+| Field | Value |
+|---|---|
+| Architecture | HybridACTDiffusion (ACT encoder + Diffusion U-Net) — see `lerobot/common/policies/hybrid_act_diffusion/` |
+| Dataset | [JHeisler/aloha_solo_left_4_6_26](https://huggingface.co/datasets/JHeisler/aloha_solo_left_4_6_26) — 50 episodes, 29,785 samples, 30 fps |
+| State / action dim | 9 / 9 |
+| Cameras | `cam_high`, `cam_left_wrist` (3×480×640 each) |
+| Steps | 13,400 |
+| Batch size | 24 (DOE winner) |
+| Learning rate | 3e-5 |
+| Total samples seen | ~321K (~10.6 epochs) |
+| AMP | enabled |
+| torch.compile | enabled |
+| Diffusion scheduler | DDPM training (100 timesteps, squaredcos_cap_v2), DDIM at inference (10 steps) |
+| Final loss (DDPM noise-pred MSE) | 0.011–0.020 |
+| Final grad norm | 0.2–0.7 |
+| Wall clock | ~1h 16min on RTX A4500 |
+| LeRobot pin | `96c7052777aca85d4e55dfba8f81586103ba8f61` (with custom hybrid_act_diffusion policy added) |
+## Project Lineage
+| Workstream | Model | Steps | Samples | HF |
+|---|---|---|---|---|
+| S001 | ACT | 13,400 | 640K | [act_left](https://huggingface.co/JHeisler/aloha_solo_left_4_6_26_act_left) |
+| **S002** | **Hybrid ACT+Diffusion** | **13,400** | **321K** | **this repo** |
+| S003 | ACT (shipped) | 40,000 | 1.92M | [act_left_40k](https://huggingface.co/JHeisler/aloha_solo_left_4_6_26_act_left_40k) |
+| S004 | Hybrid ACT+Diffusion | 40,000 | 1.12M | [act_diffusion_40k](https://huggingface.co/JHeisler/aloha_solo_left_act_diffusion_40k) |
+## Notes on loss comparability
+DDPM noise-prediction MSE (this model) and ACT's L1+KL combo (S001/S003) are different loss surfaces — absolute loss values are NOT directly comparable across architectures. The right comparison is offline action L1 on held-out episodes or real-robot rollout success rate.
+## Usage
+The custom policy class lives in this project's LeRobot fork. To use:
+```python
+# Requires lerobot pinned to 96c7052 with hybrid_act_diffusion policy package added
+from lerobot.common.policies.hybrid_act_diffusion.modeling_hybrid_act_diffusion import HybridACTDiffusionPolicy
+policy = HybridACTDiffusionPolicy.from_pretrained("JHeisler/aloha_solo_left_act_diffusion")
+```
+## Citation / Course
+EN.525.681 school project — JHU Whiting School of Engineering. Team: Jake Heisler, Laura Kroening, Purushottam Shukla.
+Code reference: [HuggingFace LeRobot](https://github.com/huggingface/lerobot) at commit `96c7052` with custom hybrid policy package.