FlowDiT V3 Humanoid โ€” Video-to-Navigation (GENESIS)

Part of the GENESIS research framework: video-conditioned robot learning.

Paper: Action Agent: Agentic Video Generation Meets Flow-Constrained Diffusion (IROS 2026)

Code: github.com/jeffrinsam/GENESIS โ†’ part2_navigation/flow_constrained_v3_humanoid/

Model Description

FlowDiT V3 Humanoid is an inference-optimized Diffusion Transformer specialized for bipedal humanoid navigation (Unitree G1). It extends FlowDiT V2 with humanoid-specific motion constraints and whole-body balance priors.

Architecture:

  • Visual encoder: DINOv2-ViT-B/14 (frozen)
  • Flow encoder: RAFT optical flow with humanoid-specific temporal attention
  • DiT backbone: Enlarged Diffusion Transformer with balance-constraint cross-attention
  • Output: 3-DOF velocity command [vx, vy, yaw_rate] + gait phase signal

Target robot: Unitree G1 humanoid (inference only โ€” see code for the training pipeline).

Runtime: PyTorch 2.9.1+cu128, requires ~4 GB VRAM for inference.

Performance

Evaluated on Unitree G1 in Isaac Sim navigation tasks:

Metric Value
Success Rate (SR @ 3.0 m) 100%
SR @ 1.0 m (post-processed) ~39%
Avg Trajectory Error (ATE) 0.38 m

Usage

# Activate the V3 inference venv (torch 2.9.1+cu128)
cd GENESIS/part2_navigation/flow_constrained_v3_humanoid
source .venv/bin/activate

python infer_humanoid.py \
  --checkpoint flowdit_v3_humanoid_best.pt \
  --goal_video goal.mp4 \
  --current_obs obs.jpg

Download via the GENESIS checkpoint script:

bash scripts/download_checkpoints.sh

Checkpoint Details

File Size Format
flowdit_v3_humanoid_best.pt 982 MB PyTorch state dict + config

Citation

@inproceedings{sam2026actionagent,
  title     = {Action Agent: Agentic Video Generation Meets Flow-Constrained Diffusion},
  author    = {Sam, Jeffrin and Khang, Nguyen and Mahmoud, Yara and
               Altamirano Cabrera, Miguel and Tsetserukou, Dzmitry},
  booktitle = {2026 IEEE/RSJ International Conference on Intelligent Robots
               and Systems (IROS)},
  year      = {2026},
  note      = {arXiv:2605.01477}
}

License

Apache 2.0. See LICENSE.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Paper for JeffrinSam/genesis-flowdit-v3-humanoid