Robotics
LeRobot
Safetensors
act
diffusion
imitation-learning
behavior-cloning
aloha
pytorch_model_hub_mixin
model_hub_mixin
Instructions to use JHeisler/aloha_solo_left_act_diffusion with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use JHeisler/aloha_solo_left_act_diffusion with LeRobot:
- Notebooks
- Google Colab
- Kaggle
Update model card with accurate training metadata, lineage, and usage
Browse files
README.md
CHANGED
|
@@ -1,14 +1,86 @@
|
|
| 1 |
---
|
| 2 |
library_name: lerobot
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- act
|
| 5 |
- diffusion
|
| 6 |
-
- model_hub_mixin
|
| 7 |
-
- pytorch_model_hub_mixin
|
| 8 |
- robotics
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
library_name: lerobot
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
pipeline_tag: robotics
|
| 5 |
tags:
|
| 6 |
- act
|
| 7 |
- diffusion
|
|
|
|
|
|
|
| 8 |
- robotics
|
| 9 |
+
- imitation-learning
|
| 10 |
+
- behavior-cloning
|
| 11 |
+
- aloha
|
| 12 |
+
- pytorch_model_hub_mixin
|
| 13 |
+
- model_hub_mixin
|
| 14 |
+
datasets:
|
| 15 |
+
- JHeisler/aloha_solo_left_4_6_26
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# Hybrid ACT+Diffusion β ALOHA Single-Arm (Left) β 13.4k steps
|
| 19 |
+
|
| 20 |
+
Custom **HybridACTDiffusion** policy: ACT visual encoder (ResNet18 + 4-layer Transformer, mean-pooled) feeds a Diffusion U-Net decoder (FiLM conditioning, DDPM training, DDIM 10-step inference). No VAE β diffusion handles multimodal action distributions directly.
|
| 21 |
+
|
| 22 |
+
This is the **initial 13.4k-step Hybrid baseline (S002)**. For the longer 40k retrain, see [JHeisler/aloha_solo_left_act_diffusion_40k](https://huggingface.co/JHeisler/aloha_solo_left_act_diffusion_40k).
|
| 23 |
+
|
| 24 |
+
## Architecture
|
| 25 |
+
|
| 26 |
+
```
|
| 27 |
+
Images (cam_high, cam_left_wrist) + State (dim=9)
|
| 28 |
+
β
|
| 29 |
+
βΌ
|
| 30 |
+
ACT Encoder (ResNet18 β 4-layer Transformer) β mean-pool β (B, 512) global cond vector
|
| 31 |
+
β
|
| 32 |
+
βΌ
|
| 33 |
+
Diffusion U-Net (DiffusionConditionalUnet1d, FiLM modulation, down_dims=(256,512))
|
| 34 |
+
β DDPM training / DDIM 10-step inference
|
| 35 |
+
βΌ
|
| 36 |
+
Action chunks (chunk_size=100, action_dim=9)
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
## Training Config
|
| 40 |
+
|
| 41 |
+
| Field | Value |
|
| 42 |
+
|---|---|
|
| 43 |
+
| Architecture | HybridACTDiffusion (ACT encoder + Diffusion U-Net) β see `lerobot/common/policies/hybrid_act_diffusion/` |
|
| 44 |
+
| Dataset | [JHeisler/aloha_solo_left_4_6_26](https://huggingface.co/datasets/JHeisler/aloha_solo_left_4_6_26) β 50 episodes, 29,785 samples, 30 fps |
|
| 45 |
+
| State / action dim | 9 / 9 |
|
| 46 |
+
| Cameras | `cam_high`, `cam_left_wrist` (3Γ480Γ640 each) |
|
| 47 |
+
| Steps | 13,400 |
|
| 48 |
+
| Batch size | 24 (DOE winner) |
|
| 49 |
+
| Learning rate | 3e-5 |
|
| 50 |
+
| Total samples seen | ~321K (~10.6 epochs) |
|
| 51 |
+
| AMP | enabled |
|
| 52 |
+
| torch.compile | enabled |
|
| 53 |
+
| Diffusion scheduler | DDPM training (100 timesteps, squaredcos_cap_v2), DDIM at inference (10 steps) |
|
| 54 |
+
| Final loss (DDPM noise-pred MSE) | 0.011β0.020 |
|
| 55 |
+
| Final grad norm | 0.2β0.7 |
|
| 56 |
+
| Wall clock | ~1h 16min on RTX A4500 |
|
| 57 |
+
| LeRobot pin | `96c7052777aca85d4e55dfba8f81586103ba8f61` (with custom hybrid_act_diffusion policy added) |
|
| 58 |
+
|
| 59 |
+
## Project Lineage
|
| 60 |
+
|
| 61 |
+
| Workstream | Model | Steps | Samples | HF |
|
| 62 |
+
|---|---|---|---|---|
|
| 63 |
+
| S001 | ACT | 13,400 | 640K | [act_left](https://huggingface.co/JHeisler/aloha_solo_left_4_6_26_act_left) |
|
| 64 |
+
| **S002** | **Hybrid ACT+Diffusion** | **13,400** | **321K** | **this repo** |
|
| 65 |
+
| S003 | ACT (shipped) | 40,000 | 1.92M | [act_left_40k](https://huggingface.co/JHeisler/aloha_solo_left_4_6_26_act_left_40k) |
|
| 66 |
+
| S004 | Hybrid ACT+Diffusion | 40,000 | 1.12M | [act_diffusion_40k](https://huggingface.co/JHeisler/aloha_solo_left_act_diffusion_40k) |
|
| 67 |
+
|
| 68 |
+
## Notes on loss comparability
|
| 69 |
+
|
| 70 |
+
DDPM noise-prediction MSE (this model) and ACT's L1+KL combo (S001/S003) are different loss surfaces β absolute loss values are NOT directly comparable across architectures. The right comparison is offline action L1 on held-out episodes or real-robot rollout success rate.
|
| 71 |
+
|
| 72 |
+
## Usage
|
| 73 |
+
|
| 74 |
+
The custom policy class lives in this project's LeRobot fork. To use:
|
| 75 |
+
|
| 76 |
+
```python
|
| 77 |
+
# Requires lerobot pinned to 96c7052 with hybrid_act_diffusion policy package added
|
| 78 |
+
from lerobot.common.policies.hybrid_act_diffusion.modeling_hybrid_act_diffusion import HybridACTDiffusionPolicy
|
| 79 |
+
policy = HybridACTDiffusionPolicy.from_pretrained("JHeisler/aloha_solo_left_act_diffusion")
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Citation / Course
|
| 83 |
+
|
| 84 |
+
EN.525.681 school project β JHU Whiting School of Engineering. Team: Jake Heisler, Laura Kroening, Purushottam Shukla.
|
| 85 |
+
|
| 86 |
+
Code reference: [HuggingFace LeRobot](https://github.com/huggingface/lerobot) at commit `96c7052` with custom hybrid policy package.
|