Go2+Z1 Walking Policy (V1, state-only PPO)

PPO walking policy for the Unitree Go2 + Z1 composite robot (12 leg DOFs + 6 arm DOFs = 18 DOF), trained in Isaac Lab on flat ground while holding the Z1 arm folded on the back.

Highlights

Backbone: rsl-rl OnPolicyRunner actor-critic (MLP 512-256-128, ELU)
Task: Isaac-Velocity-Flat-Go2Z1-v0 (forward/lateral linear vel + small yaw rate commands)
4096 parallel envs × 1500 PPO iters on a single RTX PRO 6000 Blackwell (96 GB)
Z1 arm forced to remain in the folded "startFlat" pose during locomotion
Verified: walks 10 m inside the real Simple_Warehouse/warehouse.usd (3/3 episodes)

Files

model_*.pt — checkpoint dictionaries with actor_state_dict / critic_state_dict

Architecture

Actor MLP : Linear(obs→512) ELU Linear(512→256) ELU Linear(256→128) ELU Linear(128→12)
Critic MLP: same shape, single value head
Inputs    : base lin_vel + ang_vel + projected_gravity + commands + joint_pos + joint_vel + last_action
Outputs   : 12 leg joint position deltas (Go2 hip/thigh/calf × 4)

Usage

import torch, torch.nn as nn

# Load checkpoint
state = torch.load("model_1499.pt", map_location="cuda:0", weights_only=False)
sd = state["actor_state_dict"]

# Rebuild actor (3 hidden layers + output)
h, obs_dim = sd["mlp.0.weight"].shape[0], sd["mlp.0.weight"].shape[1]
act_dim = sd["mlp.6.weight"].shape[0]
actor = nn.Sequential(
    nn.Linear(obs_dim, h), nn.ELU(),
    nn.Linear(h, h), nn.ELU(),
    nn.Linear(h, h), nn.ELU(),
    nn.Linear(h, act_dim),
).cuda().eval()
actor.load_state_dict({k.replace("mlp.", ""): v for k, v in sd.items() if k.startswith("mlp.")})

# obs comes from Isaac Lab's Isaac-Velocity-Flat-Go2Z1-Play-v0 env
with torch.inference_mode():
    action = actor(obs)

For end-to-end inference inside Isaac Sim, see stage4_joint_eval/walk_in_real_warehouse.py.

Training data

This is an on-policy RL model — no offline dataset is used. The policy is trained from scratch by interacting with the simulator. The full task definition (rewards, observations, terminations) lives in:

Repo: https://github.com/aws300/go2_z1_warehouse
Task config: go2_z1_warehouse/stage1_walking/{flat_env_cfg.py, rough_env_cfg.py}

Eval results

Scenario	Episodes	Success	Mean traveled
Flat plane	10	100 %	—
4 cuboid shelves	5	80 %	11.21 m
Real `warehouse.usd`	3	100 %	10.00 m

Citation

@misc{go2z1-walking-v1,
  title  = {Go2+Z1 Warehouse Walking Policy V1 (state-only PPO)},
  author = {m3},
  year   = {2026},
  url    = {https://huggingface.co/m3/go2z1-walking-rsl-rl-v1}
}

Successor

V2 (rotation-capable + heading-tracking): m3/go2z1-walking-rsl-rl-v2

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning