KIMODO - Kinematic Motion Diffusion

KIMODO is integrated as an hftrainer runtime wrapper around a vendored copy of NVIDIA's official Python runtime under hftrainer.models.motion.kimodo._vendor. Unlike the T2M-only baselines, the KIMODO pipeline is multi-task: the same bundle covers plain text generation and KIMODO's kinematic-control interfaces.

Implementation status: the artifact inference path is now ref_repo-free. KIMODOBundle imports the vendored runtime from the hftrainer package. The SOMA-RP artifact packages the checkpoint snapshot and local text encoders; the G1 artifacts package their checkpoint snapshots and resolve the shared KIMODO text encoders from the SOMA-RP artifact on first model load.

Item Value
Bundle / Pipeline KIMODOBundle / KIMODOPipeline
Processed HF artifacts [SOMA-RP](https://huggingface.co/ZeyuLing/hftrainer-kimodo-soma-rp), [G1-RP](https://huggingface.co/ZeyuLing/hftrainer-kimodo-g1-rp), [G1-SEED](https://huggingface.co/ZeyuLing/hftrainer-kimodo-g1-seed), [SMPLX-RP](https://huggingface.co/ZeyuLing/hftrainer-kimodo-smplx-rp) (private / license review)
Default model Kimodo-SOMA-RP-v1
Runtime implementation vendored under hftrainer.models.motion.kimodo._vendor.kimodo
Supported skeletons SOMA, Unitree G1, SMPL-X
Text encoder LLM2Vec / Meta-Llama-3 local text encoder; stored in SOMA-RP and shared by G1 artifacts
SMPL mesh bridge hftrainer.motion.retarget.KIMODOSOMAToSMPLRetargeter

Weights

The SOMA-RP model-zoo artifact is self-contained, including the KIMODO checkpoint snapshot and text-encoder files needed by KIMODO's local encoder. The G1 artifacts store their own KIMODO checkpoint snapshots and point text_encoders_repo at the SOMA-RP artifact to avoid duplicating the same 16GB LLM2Vec / Meta-Llama tree in every KIMODO repository.

Variant Processed Hugging Face artifact Native skeleton Status
SOMA-RP [ZeyuLing/hftrainer-kimodo-soma-rp](https://huggingface.co/ZeyuLing/hftrainer-kimodo-soma-rp) SOMA packaged for hftrainer, T2M visual path checked; private pending license review
G1-RP [ZeyuLing/hftrainer-kimodo-g1-rp](https://huggingface.co/ZeyuLing/hftrainer-kimodo-g1-rp) Unitree G1 packaged for hftrainer; reuses shared text encoders from SOMA-RP
G1-SEED [ZeyuLing/hftrainer-kimodo-g1-seed](https://huggingface.co/ZeyuLing/hftrainer-kimodo-g1-seed) Unitree G1 packaged for hftrainer; reuses shared text encoders from SOMA-RP
SMPLX-RP [ZeyuLing/hftrainer-kimodo-smplx-rp](https://huggingface.co/ZeyuLing/hftrainer-kimodo-smplx-rp) SMPL-X packaged for hftrainer; reuses shared text encoders from SOMA-RP

Because the KIMODO artifacts depend on third-party text-encoder weights, their redistribution status follows the upstream KIMODO, LLM2Vec, and Meta Llama licenses. The uploaded hftrainer repos are kept private until those redistribution terms are reviewed. The Meta-Llama text encoder is stored once in standard Transformers safetensors shards inside the SOMA-RP artifact; duplicate upstream original/*.pth exports are intentionally omitted.

Loading

Load from Hugging Face with the same from_pretrained style used by the other hftrainer model-zoo entries:

from hftrainer.pipelines.motion.kimodo_pipeline import KIMODOPipeline

pipe = KIMODOPipeline.from_pretrained(
    "ZeyuLing/hftrainer-kimodo-soma-rp",
    device="cuda",
)
out = pipe.text_to_motion("a person walks forward.", num_frames=150)

Use the matching repo id for Unitree G1 or SMPL-X variants:

pipe = KIMODOPipeline.from_pretrained(
    "ZeyuLing/hftrainer-kimodo-g1-rp",  # or hftrainer-kimodo-g1-seed
    device="cuda",
)
out = pipe.text_to_motion("a humanoid robot walks forward.", num_frames=150)
pipe = KIMODOPipeline.from_pretrained(
    "ZeyuLing/hftrainer-kimodo-smplx-rp",
    device="cuda",
)
out = pipe.text_to_motion("a person walks forward.", num_frames=150)

For G1 and SMPLX artifacts, KIMODOBundle.from_pretrained resolves the shared text_encoders/ tree from ZeyuLing/hftrainer-kimodo-soma-rp when the model is loaded. Since these repos are private during license review, run with an authenticated Hugging Face token that can read the KIMODO hftrainer repos.

The lower-level bundle is still available when you need direct access to KIMODO runtime paths:

from hftrainer.models.motion.kimodo import KIMODOBundle
from hftrainer.pipelines.motion.kimodo_pipeline import KIMODOPipeline

bundle = KIMODOBundle.from_pretrained(
    "ZeyuLing/hftrainer-kimodo-soma-rp",
    device="cuda",
)
pipe = KIMODOPipeline(bundle)

For environments that need an explicit local snapshot:

from huggingface_hub import snapshot_download
from hftrainer.models.motion.kimodo import KIMODOBundle

path = snapshot_download("ZeyuLing/hftrainer-kimodo-soma-rp")
bundle = KIMODOBundle.from_pretrained(path, device="cuda")

Local construction from official NVIDIA checkpoint folders is still available:

from hftrainer.models.motion.kimodo import KIMODOBundle

bundle = KIMODOBundle.from_pretrained(
    "Kimodo-SOMA-RP-v1",
    device="cuda",
    text_encoder_mode="local",
    checkpoint_dir="checkpoints/kimodo/local_models",
    text_encoders_dir="checkpoints/kimodo/text_encoders",
)

Supported Tasks

The wrapper exposes the task surface KIMODO itself supports:

Task Pipeline API KIMODO control
Text-to-motion text_to_motion() / __call__ prompt only
Multi-prompt motion multi_prompt() segment stitching with transition frames
Full-body keyframes fullbody_keyframe_constraint() + constrained_motion() FullBodyConstraintSet
End-effector control end_effector_constraint() / hand-foot helpers EndEffectorConstraintSet
2D root path / waypoints root2d_constraint() + constrained_motion() Root2DConstraintSet
Saved KIMODO demo constraints constraints_from_json() + constrained_motion() official constraint JSON

Examples:

# Multi-prompt stitching.
out = pipe.multi_prompt(
    ["a person walks forward.", "the person turns left."],
    [90, 90],
)

# Constraint JSON produced by the official KIMODO demos.
constraints = pipe.constraints_from_json("constraints.json", device="cuda")
out = pipe.constrained_motion(
    "a person follows the specified path.",
    num_frames=180,
    constraints=constraints,
)

# Programmatic 2D root waypoints.
root2d = pipe.root2d_constraint(
    frame_indices=[0, 60, 120],
    smooth_root_2d=[[0.0, 0.0], [0.8, 0.1], [1.5, 0.6]],
    device="cuda",
)
out = pipe.constrained_motion(
    "a person walks along a curved path.",
    num_frames=150,
    constraints=[root2d],
)

Retargeting To SMPL

For mesh inspection and HumanML3D-style evaluation, retarget KIMODO SOMA output through the rotation-aware SOMA-to-SMPL operator:

from hftrainer.motion.retarget import KIMODOSOMAToSMPLRetargeter

retargeter = KIMODOSOMAToSMPLRetargeter(device="cuda")
smpl = retargeter.retarget_file("kimodo_debug_npz/000000.npz")
motion_135 = smpl["motion_135"]

The debug NPZ must contain KIMODO global_rot_mats. Position-only IK is a fallback for diagnostics and should not be used as the mesh-quality path.

Artifact Format

KIMODOBundle.save_pretrained() stores a complete runtime artifact:

kimodo_config.json
model_index.json
README.md
kimodo_checkpoint/Kimodo-SOMA-RP-v1/
text_encoders/

G1 and SMPLX artifacts use the same format but omit text_encoders/ locally:

kimodo_config.json
model_index.json
README.md
kimodo_checkpoint/Kimodo-G1-RP-v1/

KIMODOBundle.from_pretrained(path_or_repo) resolves the artifact-local kimodo_checkpoint/ and text_encoders/ directories into KIMODO's CHECKPOINT_DIR and TEXT_ENCODERS_DIR environment hooks before calling the official loader.

Package the default SOMA-RP model:

python3 scripts/eval/convert_kimodo_checkpoint.py \
  --model_name Kimodo-SOMA-RP-v1 \
  --checkpoint_dir checkpoints/kimodo/local_models \
  --text_encoders_dir checkpoints/kimodo/text_encoders \
  --out_dir checkpoints/kimodo/hftrainer_soma_rp \
  --verify

Package a G1 checkpoint from a local Hugging Face snapshot:

python3 scripts/eval/convert_kimodo_checkpoint.py \
  --model_name Kimodo-G1-RP-v1 \
  --checkpoint_source checkpoints/kimodo/hub/models--nvidia--Kimodo-G1-RP-v1/snapshots/<sha> \
  --out_dir checkpoints/kimodo/hftrainer_g1_rp \
  --no_text_encoder \
  --text_encoders_repo ZeyuLing/hftrainer-kimodo-soma-rp \
  --copy_mode hardlink \
  --verify

Evaluation Status

Current status is intentionally split by task. The trusted HumanML3D T2M numbers below are from the SMPL-X-RP hftrainer artifact, exported through the SMPL-X/SOMA-to-SMPL bridge and then scored with the persisted hftrainer evaluators.

HumanML3D T2M Metrics (SMPL-X-RP, 2026-06-16)

Prediction/evaluation root: outputs/evaluation/kimodo_smplx_hml3d_smpl_ms272_v100x1_20260616.

Evaluator Samples FID ↓ R-Precision Top-1 / 2 / 3 ↑ Diversity → MM-Dist ↓ Metric JSON
HumanML3D-263 2,478 1.8425 0.3135 / 0.4818 / 0.5925 9.1488 4.2810 metrics/verify_hml263.json
MotionStreamer-272 7,392 143.9169 0.3225 / 0.4601 / 0.5413 25.3156 21.7065 metrics/verify_ms272.json

The MotionStreamer-272 score uses the explicit KIMODO SMPL-X output -> SMPL motion_135 -> MotionStreamer-272 conversion chain. The GT(real) sanity row in the same run matches the persisted MotionStreamer evaluator contract (R@1=0.706, Diversity≈27.36, MM-Dist≈15.01), so these numbers are suitable for the current model card.

Reproduce the run with the Taiji submit helper:

python3 scripts/submit/submit_kimodo_hml3d_smpl_ms272_taiji.py \
  --out-root outputs/evaluation/kimodo_smplx_hml3d_smpl_ms272_v100x1_20260616 \
  --feature-namespace kimodo_smplx_t2m_hml3d_smpl_ms272_20260616 \
  --num-jobs 24 \
  --gpus-per-job 1 \
  --no-submit-cache
Area Status
T2M SMPLX-RP generated with the hftrainer artifact, retargeted to SMPL, and scored with persisted evaluators
HumanML3D / MS272 metrics trusted for SMPLX-RP run above; always copy future values from generated JSON files, not by hand
Multi-prompt / root path / keyframe / inbetween controls API-supported in KIMODOPipeline; visualization must follow the task protocol manifest
End-effector controls API-supported; visualization must render exported end-effector target points
Body-part control unsupported for KIMODO; do not report forced subset constraints as a valid KIMODO task
G1-RP / G1-SEED packaged as hftrainer artifacts; runtime wrapper supports the native Unitree G1 skeleton
SMPLX-RP packaged as an hftrainer artifact; runtime wrapper supports the native SMPL-X skeleton

Visualization Protocol

KIMODO visualization is defined by reusable hftrainer motion protocols, not by a one-off web page. The task/panel/frame contracts live in hftrainer.motion.visualization.kimodo and hftrainer.motion.visualization.protocol; the detailed condition families are documented in docs/motion/kimodo_visualization_protocols.md.

KIMODO is not a single T2M viewer case. Each supported task must expose the generated motion plus the task condition that the model was asked to satisfy. The current manifest layout is:

outputs/evaluation/<kimodo-viewer-root>/
  _manifest.json              # protocol rows from hftrainer.motion.visualization
  _captions.json
  gt/<case>.npz               # same-timeline reference / condition-source SMPL motion
  condition_smpl/<case>.npz   # generated-timeline condition source, visible only on constrained frames
  kimodo_smpl/<case>.npz      # generated KIMODO retargeted to SMPL
  kimodo_soma/<case>.npz      # generated native SOMA mesh

_manifest.json includes frame_semantics, condition_overlays, panel_visible_ranges, missing_panels, and diagnostics. Missing panels are source/export limitations, not UI bugs. A gt panel is only valid when the reference motion is finite, nonzero, and on the same frame timeline as the generated motion. For transition/prepend tasks, use condition_smpl plus panel_visible_ranges.condition_smpl instead of fabricating full-timeline GT.

Run the viewer:

PYTHONPATH=$PWD HFTRAINER_SKIP_AUTOREGISTER=1 \
python3 motion_annot_web/m2m_eval_viewer/retarget_smpl_app.py \
  --host 0.0.0.0 --port 8216 \
  --root outputs/evaluation/kimodo_all_tasks_mesh_viewer_20260615_refactor \
  --model gt=gt=GT=#2ca02c \
  --model condition=condition_smpl=Condition-SMPL=#f59e0b \
  --model soma=kimodo_soma=KIMODO-SOMA=#f97316 \
  --model smpl=kimodo_smpl=KIMODO-SMPL=#9467bd \
  --case-mode union --color-mode condition --list-captions

Current live debug URL on the workstation uses port 80:

http://21.6.58.73/

Task protocols:

Task Condition shown in viewer Generated output shown
Text-to-motion text prompt full generated motion
Full-body keyframes exact keyframe indices from keyframe_indices generated inter-keyframe motion
Inbetween endpoints first and last frames generated middle frames
Transition stitching condition_smpl on A-tail/B-head ranges from layout_json; panel hidden during the generated transition generated transition in kimodo_soma and kimodo_smpl
Prepend start pose condition_smpl on target start pose and conditioned motion-A prefix; panel hidden during the generated prepend transition generated prepend transition in kimodo_soma and kimodo_smpl
2D root path condition_overlays.root_trajectory rendered only on the primary generated SMPL panel as one clean XZ path rail, start/end dots, a current-frame cursor, and one top-down XZ inset; generated body color remains generated output generated body motion following the path
Constraint JSON sparse saved KIMODO constraints, shown as every-30-frame markers when metadata is sparse generated constrained motion
Body-part control not shown; KIMODO has no native arbitrary body-part mask task unsupported
Multi-prompt / local edit segment prompt or edit mask stitched / edited motion
Style edit style instruction style-transferred motion
End-effector control condition_overlays.joint_targets rendered only on the primary generated SMPL panel; each frame shows only its active target points as compact colored anchors with a vertical locator line and floor ring generated motion satisfying sparse targets

Condition frames are markers or overlays on the generated sequence, not a separate stream of motion. If the mesh visibly jumps around a keyframe marker, that is not an expected viewer-side transition. Check diagnostics.continuity in the manifest and then debug the KIMODO constraint / retarget export for that sample.

The viewer consumes _manifest.json instead of guessing from directory names. Each panel has an explicit role (reference, generated, generated_native), and each case has frame_semantics so discrete keyframes/endpoints and continuous-control tasks are displayed differently.

Relevant scripts:

  • scripts/submit/submit_kimodo_hml3d_smpl_ms272_taiji.py submits the current KIMODO T2M -> SOMA -> SMPL -> HumanML3D/MS272 path.
  • scripts/analysis/build_kimodo_task_mesh_viewer.py assembles a compact task-protocol-aware GT / KIMODO-SMPL / optional KIMODO-SOMA visual sanity fixture for the supported KIMODO task surface.
  • scripts/submit/submit_kimodo_t2m_eval.py keeps the older T2M submission entry point.
  • scripts/kimodo/run_kimodo_all_tasks.py and tools/run_kimodo_all_tasks.py cover the broader KIMODO task family.
  • scripts/eval/run_kimodo_tp2m_table2.sh covers prefix-pose + text generation.
  • scripts/eval/run_e10_kimodo_batch.sh and scripts/eval/run_e10_kimodo_h3d500_metrics.sh cover the E10 bridge path.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support