Action Chunking Transformer (ACT)
Action Chunking with Transformers (ACT) is an imitation-learning policy that predicts short action chunks from robot state and visual observations. The robot can execute those chunks as a sequence of real-world movements.
This model was trained and exported with Physical AI Studio for local or Hugging Face-hosted robot inference.
Model Details
- Policy: act
- Runtime library:
physicalai - Generated by: Physical AI Studio
Intended Use
Use this model for robot imitation-learning inference in setups matching the training dataset, robot embodiment, camera viewpoints, and task instructions. Validate behavior in simulation or a safe test cell before running on hardware.
Dataset
This model was trained from the Physical AI Studio dataset named Dice cleanup.
Model Package
Load the model from the root directory when possible. The root manifest.json is the package entry point, and
backend-specific manifests live under exports/<backend>/manifest.json.
| Backend | Artifact | Intended Use |
|---|---|---|
| torch | exports/torch/act.pt |
Canonical checkpoint and Python inference |
| executorch | exports/executorch/act.pte |
Edge and mobile runtime experiments |
| onnx | exports/onnx/act.onnx |
Runtime portability |
| openvino | exports/openvino/act.xml |
CPU, Intel GPU, and NPU inference |
Training Environment
Environment: So101
name: So101
robots:
- name: SO101 Follower
type: SO101_Follower
calibration:
elbow_flex:
id: 3
drive_mode: 0
homing_offset: 1149
range_min: 851
range_max: 3074
gripper:
id: 6
drive_mode: 0
homing_offset: 1088
range_min: 1938
range_max: 3416
shoulder_lift:
id: 2
drive_mode: 0
homing_offset: 263
range_min: 821
range_max: 3195
shoulder_pan:
id: 1
drive_mode: 0
homing_offset: 135
range_min: 732
range_max: 3454
wrist_flex:
id: 4
drive_mode: 0
homing_offset: -1606
range_min: 860
range_max: 3188
wrist_roll:
id: 5
drive_mode: 0
homing_offset: 612
range_min: 124
range_max: 3956
cameras:
- name: Gripper
driver: usb_camera
hardware_name: 'Innomaker-U20CAM-1080p-S1: Inno'
width: 640
height: 480
fps: 30
- name: Overview
driver: usb_camera
hardware_name: 'Innomaker-U20CAM-1080p-S1: Inno'
width: 640
height: 480
fps: 30
I/O Specification
Shared By executorch, onnx, openvino, torch
Inputs
| Name | Type | Shape | Dtype |
|---|---|---|---|
| state | STATE | [6] | float32 |
| images.gripper | VISUAL | [3, 480, 640] | float32 |
| images.overview | VISUAL | [3, 480, 640] | float32 |
Outputs
| Name | Type | Shape | Dtype |
|---|---|---|---|
| action | ACTION | [100, 6] | float32 |
Running Inference
Installation
uv pip install physicalai numpy
The following smoke test verifies that the package loads and accepts tensors with the declared shapes. Replace the dummy values with observations from your robot runtime before using the model for control.
import numpy as np
from physicalai.inference import InferenceModel
MODEL_PATH = "path/to/model"
model = InferenceModel.load(MODEL_PATH, device="CPU")
observation = {
"state": np.random.rand(1, 6).astype(np.float32),
"images.gripper": np.random.rand(1, 3, 480, 640).astype(np.float32),
"images.overview": np.random.rand(1, 3, 480, 640).astype(np.float32),
}
chunk = model.predict_action_chunk(observation)
Set MODEL_PATH to this local model directory or to the Hugging Face repository id after upload.
Running A Robot Control Loop
For a blocking control loop similar to PhysicalAI's examples/runtime/sync_inference.py, start from the training robot
and camera names exported above. Local device handles are placeholders because ports, camera paths, and stream URLs are
not included in published model metadata.
python examples/runtime/sync_inference.py \
--robot so101 \
--port /dev/ttyACM0 \
--calibration ./calibration.json \
--model path/to/model \
--camera gripper:uvc:/dev/video0 \
--camera overview:uvc:/dev/video1 \
--task "Move the dice into the cup" \
--device CPU
Training / Reproducing Training
Import this model in Physical AI Studio and start a new training job using it as the base model. Studio will preserve the training lineage through the parent model relationship.
To reproduce behavior on your own hardware, match the exported I/O specification, robot type, camera viewpoints,
control frequency, and calibration values from environment.json as closely as possible.
Evaluation
No task-specific evaluation metrics were exported with this generated card. Add validation results, success rates, and hardware test conditions before publishing externally.
Limitations And Safety
Robot policies can behave unpredictably outside their training distribution. Validate camera viewpoints, lighting, object placement, calibration values, robot embodiment, and task wording before autonomous operation. Use hardware limits, emergency stops, supervision, and staged validation.
- Downloads last month
- 6