Go2+Z1 Walking Policy V2 (rotation-capable + heading-tracking)

PPO walking policy for Unitree Go2 + Z1 composite robot, upgraded over V1 to support large in-place rotations and heading commands — needed for autonomous navigation through warehouse aisles.

What's new vs V1

	V1	V2
Yaw command range `ω_z`	[-0.5, 0.5] rad/s	[-2.0, 2.0] rad/s (covers 180° pivot)
Heading command	none	30 % of envs receive `heading` command (`heading_command=True`, `rel_heading_envs=0.3`)
`track_ang_vel_z_exp` reward weight	0.75	1.2
Iterations	1500	3000
Task ID	`Isaac-Velocity-Flat-Go2Z1-v0`	`Isaac-Velocity-Flat-Go2Z1-V2-v0`

V1 oscillated whenever the commanded yaw error exceeded ~90°. V2 fixes that by giving the policy direct experience with large angular commands during training.

Files

model_*.pt — actor-critic checkpoint (rsl-rl OnPolicyRunner format)

Usage

Identical to V1 (same architecture). See V1 README for code: https://huggingface.co/m3/go2z1-walking-rsl-rl-v1

For end-to-end inference inside Isaac Sim, the goal-directed nav script that drives this policy is:

# Pseudocode — see go2_z1_warehouse/stage4_joint_eval/walk_warehouse_navigate.py
yaw_err = (target_yaw - cur_yaw + π) % 2π - π
v_fwd   = clip(0.8 * cos(yaw_err), 0.2, 0.8)
w_z     = clip(1.5 * yaw_err, -2.0, 2.0)   # V2 supports the full range
cmd_term.vel_command_b[:] = (v_fwd, 0, w_z)
action = actor(obs)

Training data

On-policy RL — no offline dataset. The full task definition lives in:

Repo: https://github.com/aws300/go2_z1_warehouse
V2 task config: go2_z1_warehouse/stage1_walking/flat_v2_env_cfg.py
Auto-pipeline launcher: go2_z1_warehouse/v2_pipeline/auto_chain.sh

Predecessor

V1 (no rotation): m3/go2z1-walking-rsl-rl-v1

Citation

@misc{go2z1-walking-v2,
  title  = {Go2+Z1 Walking Policy V2 (rotation-capable + heading-tracking)},
  author = {m3},
  year   = {2026},
  url    = {https://huggingface.co/m3/go2z1-walking-rsl-rl-v2}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning