Go2+Z1 Walking Policy V2 (rotation-capable + heading-tracking)
PPO walking policy for Unitree Go2 + Z1 composite robot, upgraded over V1 to support large in-place rotations and heading commands — needed for autonomous navigation through warehouse aisles.
What's new vs V1
| V1 | V2 | |
|---|---|---|
Yaw command range ω_z |
[-0.5, 0.5] rad/s | [-2.0, 2.0] rad/s (covers 180° pivot) |
| Heading command | none | 30 % of envs receive heading command (heading_command=True, rel_heading_envs=0.3) |
track_ang_vel_z_exp reward weight |
0.75 | 1.2 |
| Iterations | 1500 | 3000 |
| Task ID | Isaac-Velocity-Flat-Go2Z1-v0 |
Isaac-Velocity-Flat-Go2Z1-V2-v0 |
V1 oscillated whenever the commanded yaw error exceeded ~90°. V2 fixes that by giving the policy direct experience with large angular commands during training.
Files
model_*.pt— actor-critic checkpoint (rsl-rlOnPolicyRunnerformat)
Usage
Identical to V1 (same architecture). See V1 README for code: https://huggingface.co/m3/go2z1-walking-rsl-rl-v1
For end-to-end inference inside Isaac Sim, the goal-directed nav script that drives this policy is:
# Pseudocode — see go2_z1_warehouse/stage4_joint_eval/walk_warehouse_navigate.py
yaw_err = (target_yaw - cur_yaw + π) % 2π - π
v_fwd = clip(0.8 * cos(yaw_err), 0.2, 0.8)
w_z = clip(1.5 * yaw_err, -2.0, 2.0) # V2 supports the full range
cmd_term.vel_command_b[:] = (v_fwd, 0, w_z)
action = actor(obs)
Training data
On-policy RL — no offline dataset. The full task definition lives in:
- Repo: https://github.com/aws300/go2_z1_warehouse
- V2 task config:
go2_z1_warehouse/stage1_walking/flat_v2_env_cfg.py - Auto-pipeline launcher:
go2_z1_warehouse/v2_pipeline/auto_chain.sh
Predecessor
- V1 (no rotation): m3/go2z1-walking-rsl-rl-v1
Citation
@misc{go2z1-walking-v2,
title = {Go2+Z1 Walking Policy V2 (rotation-capable + heading-tracking)},
author = {m3},
year = {2026},
url = {https://huggingface.co/m3/go2z1-walking-rsl-rl-v2}
}