motion-tf-fsq64k-causal

A causal Transformer VQ-VAE motion tokenizer trained on the MotionMillion dataset. Encodes 272-dimensional human motion sequences to discrete token indices using Finite Scalar Quantization (FSQ) with a codebook of 64,000 entries.

Architecture

  • Encoder: 6-layer causal Transformer with RoPE + 4× temporal mean-pool downsampling
  • Quantizer: FSQ with levels [8,8,8,5,5,5] → 64,000 codes (deterministic, no EMA)
  • Decoder: 6-layer causal Transformer with 4× learned upsampling
  • Input: 272D motion (HumanML3D absolute-root representation, 20 fps)
  • Parameters: 68.6M

Usage

Training

  • Dataset: MotionMillion (consolidated_v1, absolute-root variant)
  • Training steps: 300,000
  • Codebook utilization: 99.3% (63,574 / 64,000 codes active)
  • Token autocorrelation lag-1: 0.0011 (near-zero repetition)
  • Normalized entropy: 0.960
Downloads last month
2
Safetensors
Model size
68.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support