motion-tf-fsq64k-causal

A causal Transformer VQ-VAE motion tokenizer trained on the MotionMillion dataset. Encodes 272-dimensional human motion sequences to discrete token indices using Finite Scalar Quantization (FSQ) with a codebook of 64,000 entries.

Architecture

Encoder: 6-layer causal Transformer with RoPE + 4× temporal mean-pool downsampling
Quantizer: FSQ with levels [8,8,8,5,5,5] → 64,000 codes (deterministic, no EMA)
Decoder: 6-layer causal Transformer with 4× learned upsampling
Input: 272D motion (HumanML3D absolute-root representation, 20 fps)
Parameters: 68.6M

Usage

Training

Dataset: MotionMillion (consolidated_v1, absolute-root variant)
Training steps: 300,000
Codebook utilization: 99.3% (63,574 / 64,000 codes active)
Token autocorrelation lag-1: 0.0011 (near-zero repetition)
Normalized entropy: 0.960

Downloads last month: 2

Safetensors

Model size

68.6M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support