motion-tf-fsq64k-causal
A causal Transformer VQ-VAE motion tokenizer trained on the MotionMillion dataset. Encodes 272-dimensional human motion sequences to discrete token indices using Finite Scalar Quantization (FSQ) with a codebook of 64,000 entries.
Architecture
- Encoder: 6-layer causal Transformer with RoPE + 4× temporal mean-pool downsampling
- Quantizer: FSQ with levels [8,8,8,5,5,5] → 64,000 codes (deterministic, no EMA)
- Decoder: 6-layer causal Transformer with 4× learned upsampling
- Input: 272D motion (HumanML3D absolute-root representation, 20 fps)
- Parameters: 68.6M
Usage
Training
- Dataset: MotionMillion (consolidated_v1, absolute-root variant)
- Training steps: 300,000
- Codebook utilization: 99.3% (63,574 / 64,000 codes active)
- Token autocorrelation lag-1: 0.0011 (near-zero repetition)
- Normalized entropy: 0.960
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support