TEXEDO — Checkpoints
Test-Time Scaling for Controller-Aware Language-Conditioned Humanoid Motion Generation
This repository hosts the pretrained checkpoints and runtime assets for TEXEDO, a text-to-motion pipeline for the Unitree G1 humanoid. Given a language prompt, TEXEDO generates multiple candidate motions, decodes them into a 36-dimensional G1 robot motion format, scores them with dynamic and semantic verifiers, and selects the best candidate for deployment.
- 🌐 Project page: https://jianuocao.github.io/TEXEDO/
- 💻 Code: https://github.com/JianuoCao/TEXEDO
- 📄 Paper: https://arxiv.org/abs/2606.22998
- 📦 Dataset: https://huggingface.co/datasets/JianuoCao/TEXEDO
Contents
| Logical name | What it is | Approx. size |
|---|---|---|
fsq_tokenizer |
FSQ motion tokenizer (encoder/decoder + codebook) for 36-dim G1 motion | ~216 MB |
fsq_norm_stats |
Per-channel normalization stats for the tokenizer | ~2 KB |
generator |
Stage-2 text→motion generator: flan-t5-base fine-tuned on FSQ motion tokens (multi-task) | ~3.2 GB |
dynamic_verifier |
Dynamic-feasibility (physical-plausibility) scorer | ~40 MB |
dynamic_norm_stats |
Normalization stats paired with the dynamic verifier | ~2 KB |
semantic_evaluator |
Text–motion matching evaluator (match net + decomposition + meta) | variable |
glove |
GloVe vocab for the semantic text encoder | ~20 MB |
g1_robot |
Unitree G1 MuJoCo model (XML + meshes) | ~26 MB |
The base LM
google/flan-t5-baseis loaded from the public Hub at runtime and is not re-hosted here.
Usage
The checkpoints are designed to be fetched automatically by the TEXEDO code:
git clone https://github.com/JianuoCao/TEXEDO.git
cd TEXEDO
conda env create -f environment.yml
conda activate TEXEDO
pip install -e .
# Downloads these checkpoints + runtime assets into ./assets
python scripts/download_assets.py
Then run the full generate → score → select → render pipeline:
python -m pipeline.generate --prompt "a person waves with the right hand" --num-samples 8 --out-dir candidates/
python -m pipeline.score --motion-dir candidates/ --caption "a person waves with the right hand" --output scores.csv
python -m pipeline.select_best_of_n --scores scores.csv --motion-dir candidates/ --copy-best-to best/
python scripts/visualize_csv.py --input-dir best/ --output-dir viz/
You can also download a single file directly:
from huggingface_hub import hf_hub_download
ckpt = hf_hub_download(
repo_id="JianuoCao/TEXEDO-Checkpoint",
filename="tokenizer/checkpoint_epoch_95.pt",
)
See the repo's docs/MODELS.md for the full asset manifest and layout.
Citation
@misc{cao2026texedotesttime,
title={TEXEDO: Test-Time Scaling for Controller-Aware Language-Conditioned Humanoid Motion Generation},
author={Jianuo Cao and Yuxin Chen and Yuzhen Song and Masayoshi Tomizuka and Chenran Li and Thomas Tian},
year={2026},
eprint={2606.22998},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2606.22998},
}
License
Released under the MIT license. Third-party datasets, pretrained base models, robot assets, and dependencies retain their own licenses and terms of use.