Go2+Z1 Walking Policy (V1, state-only PPO)
PPO walking policy for the Unitree Go2 + Z1 composite robot (12 leg DOFs + 6 arm DOFs = 18 DOF), trained in Isaac Lab on flat ground while holding the Z1 arm folded on the back.
Highlights
- Backbone: rsl-rl
OnPolicyRunneractor-critic (MLP 512-256-128, ELU) - Task:
Isaac-Velocity-Flat-Go2Z1-v0(forward/lateral linear vel + small yaw rate commands) - 4096 parallel envs × 1500 PPO iters on a single RTX PRO 6000 Blackwell (96 GB)
- Z1 arm forced to remain in the folded "startFlat" pose during locomotion
- Verified: walks 10 m inside the real
Simple_Warehouse/warehouse.usd(3/3 episodes)
Files
model_*.pt— checkpoint dictionaries withactor_state_dict/critic_state_dict
Architecture
Actor MLP : Linear(obs→512) ELU Linear(512→256) ELU Linear(256→128) ELU Linear(128→12)
Critic MLP: same shape, single value head
Inputs : base lin_vel + ang_vel + projected_gravity + commands + joint_pos + joint_vel + last_action
Outputs : 12 leg joint position deltas (Go2 hip/thigh/calf × 4)
Usage
import torch, torch.nn as nn
# Load checkpoint
state = torch.load("model_1499.pt", map_location="cuda:0", weights_only=False)
sd = state["actor_state_dict"]
# Rebuild actor (3 hidden layers + output)
h, obs_dim = sd["mlp.0.weight"].shape[0], sd["mlp.0.weight"].shape[1]
act_dim = sd["mlp.6.weight"].shape[0]
actor = nn.Sequential(
nn.Linear(obs_dim, h), nn.ELU(),
nn.Linear(h, h), nn.ELU(),
nn.Linear(h, h), nn.ELU(),
nn.Linear(h, act_dim),
).cuda().eval()
actor.load_state_dict({k.replace("mlp.", ""): v for k, v in sd.items() if k.startswith("mlp.")})
# obs comes from Isaac Lab's Isaac-Velocity-Flat-Go2Z1-Play-v0 env
with torch.inference_mode():
action = actor(obs)
For end-to-end inference inside Isaac Sim, see stage4_joint_eval/walk_in_real_warehouse.py.
Training data
This is an on-policy RL model — no offline dataset is used. The policy is trained from scratch by interacting with the simulator. The full task definition (rewards, observations, terminations) lives in:
- Repo: https://github.com/aws300/go2_z1_warehouse
- Task config:
go2_z1_warehouse/stage1_walking/{flat_env_cfg.py, rough_env_cfg.py}
Eval results
| Scenario | Episodes | Success | Mean traveled |
|---|---|---|---|
| Flat plane | 10 | 100 % | — |
| 4 cuboid shelves | 5 | 80 % | 11.21 m |
Real warehouse.usd |
3 | 100 % | 10.00 m |
Citation
@misc{go2z1-walking-v1,
title = {Go2+Z1 Warehouse Walking Policy V1 (state-only PPO)},
author = {m3},
year = {2026},
url = {https://huggingface.co/m3/go2z1-walking-rsl-rl-v1}
}
Successor
- V2 (rotation-capable + heading-tracking): m3/go2z1-walking-rsl-rl-v2