FlowDiT V3 Humanoid โ Video-to-Navigation (GENESIS)
Part of the GENESIS research framework: video-conditioned robot learning.
Paper: Action Agent: Agentic Video Generation Meets Flow-Constrained Diffusion (IROS 2026)
Code: github.com/jeffrinsam/GENESIS โ part2_navigation/flow_constrained_v3_humanoid/
Model Description
FlowDiT V3 Humanoid is an inference-optimized Diffusion Transformer specialized for bipedal humanoid navigation (Unitree G1). It extends FlowDiT V2 with humanoid-specific motion constraints and whole-body balance priors.
Architecture:
- Visual encoder: DINOv2-ViT-B/14 (frozen)
- Flow encoder: RAFT optical flow with humanoid-specific temporal attention
- DiT backbone: Enlarged Diffusion Transformer with balance-constraint cross-attention
- Output: 3-DOF velocity command
[vx, vy, yaw_rate]+ gait phase signal
Target robot: Unitree G1 humanoid (inference only โ see code for the training pipeline).
Runtime: PyTorch 2.9.1+cu128, requires ~4 GB VRAM for inference.
Performance
Evaluated on Unitree G1 in Isaac Sim navigation tasks:
| Metric | Value |
|---|---|
| Success Rate (SR @ 3.0 m) | 100% |
| SR @ 1.0 m (post-processed) | ~39% |
| Avg Trajectory Error (ATE) | 0.38 m |
Usage
# Activate the V3 inference venv (torch 2.9.1+cu128)
cd GENESIS/part2_navigation/flow_constrained_v3_humanoid
source .venv/bin/activate
python infer_humanoid.py \
--checkpoint flowdit_v3_humanoid_best.pt \
--goal_video goal.mp4 \
--current_obs obs.jpg
Download via the GENESIS checkpoint script:
bash scripts/download_checkpoints.sh
Checkpoint Details
| File | Size | Format |
|---|---|---|
flowdit_v3_humanoid_best.pt |
982 MB | PyTorch state dict + config |
Citation
@inproceedings{sam2026actionagent,
title = {Action Agent: Agentic Video Generation Meets Flow-Constrained Diffusion},
author = {Sam, Jeffrin and Khang, Nguyen and Mahmoud, Yara and
Altamirano Cabrera, Miguel and Tsetserukou, Dzmitry},
booktitle = {2026 IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS)},
year = {2026},
note = {arXiv:2605.01477}
}
License
Apache 2.0. See LICENSE.