YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
TokenGS
TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens
Jiawei Ren*, Michal Tyszkiewicz*, Jiahui Huang†, Zan Gojcic†
* indicates equal contribution, † indicates equal advising
Paper · Project Page · HuggingFace
TokenGS predicts 3D Gaussians with a self-supervised rendering objective. An encoder–decoder stacks learnable Gaussian tokens so the number of primitives is not tied to image resolution or view count.
News
- 2026.6.3: Model improvement release: Latent bottleneck models, Scene-latent tuning, and Mean-of-gradients training. See the release_2026.6 note.
- 2026.6.3: Released Kubric training and inference.
- 2026.6.2: Released TokenGS model weights on HuggingFace.
Installation
Install the package in editable mode (dependencies include PyTorch, gsplat, and fused-ssim via pyproject.toml):
uv pip install -e .
Environment: Python 3.11, CUDA 12.6+ (see pyproject.toml for pinned versions).
Data: DL3DV layout, symlinks, and dataset_kwargs are described in data/DATA.md.
Evaluation
Place weights under checkpoints/ (or pass any path to --resume). Metrics are written to <workspace>/metrics.txt; the workspace directory is created automatically.
| Checkpoint | Eval preset |
|---|---|
dl3dv_2v.safetensors |
eval_dl3dv_2view |
dl3dv_4v.safetensors |
eval_dl3dv_4view |
dl3dv_6v.safetensors |
eval_dl3dv_6view |
Example (6-view preset):
accelerate launch --config_file acc_configs/gpu1.yaml \
-m tokengs.evaluate eval_dl3dv_6view \
--workspace results/dl3dv_eval/6view \
--resume checkpoints/dl3dv_6v.safetensors \
--use_ttt_for_eval \
--eval_n_media_dumps 20 \
Presets eval_dl3dv_2view and eval_dl3dv_4view select the matching evaluation JSONs. Remove --use_ttt_for_eval to turn off test-time token tuning.
Media dumps: --eval_n_media_dumps N writes PNGs, MP4s, depth vis, and PLY for the first N dataloader batches under <workspace>/{images,videos,depths,gaussians}/ (default 0 = metrics only).
Training
1. Base run (train_dl3dv_base preset):
accelerate launch --config_file acc_configs/gpu8.yaml \
-m tokengs.train train_dl3dv_base \
--workspace workspace/dl3dv_base \
--experiment_name dl3dv_base
2. Finetune from a checkpoint (presets finetune_dl3dv_2view, finetune_dl3dv_4view, finetune_dl3dv_6view):
accelerate launch --config_file acc_configs/gpu8.yaml \
-m tokengs.train finetune_dl3dv_2view \
--workspace workspace/dl3dv_2view \
--experiment_name dl3dv_2view \
--resume workspace/dl3dv_base/model.safetensors
Swap the subcommand for 4- or 6-view finetune presets as needed.
Kubric Dynamic Model
A Kubric dynamic checkpoint is available at checkpoints_26.06/kubric_dyn.safetensors. Point data/kubric at the Kubric multi-view dump:
data/kubric/
v0/
<scene>/
output_000.tar
output_001.tar
...
v1/
v2/
...
The top-level v0, v1, v2, ... folders are data splits. Each output_{view:03d}.tar contains metadata.json, rgba_{frame:05d}.png, and depth_{frame:05d}.tiff. The dynamic presets use pointmap camera scaling, so input-frame depth TIFFs are loaded for scale normalization.
Warm-Start Strategy
The released dynamic preset warm-starts static GS tokens from the DL3DV base checkpoint and initializes dynamic GS tokens from the static tokens. Its auxiliary dynamic-only render loss is held at 0.3 for 5K steps, then linearly decays to 0 over the next 10K steps.
Finetune example:
accelerate launch --config_file acc_configs/gpu8.yaml \
-m tokengs.train finetune_dl3dv_kubric_dyn_release \
--workspace workspace/kubric_dyn \
--experiment_name kubric_dyn \
--resume checkpoints/dl3dv_base.safetensors
The render script uses the released Kubric preset by default; pass --workspace only when rendering a finetuned workspace with its own config.yaml.
python scripts/render_kubric_dyn.py \
--preset finetune_dl3dv_kubric_dyn_release \
--ckpt checkpoints_26.06/kubric_dyn.safetensors \
--kubric_root data/kubric \
--out_dir results/kubric_dyn_renders \
--scene_idx 0 \
--n_scenes 4 \
--fps 8
For each scene, the render script writes fixed-camera, trajectory, dynamic-only, static-only, and input-camera hstack videos.
License
TokenGS is released under the Apache License 2.0. See CONTRIBUTING.md for contribution guidelines.
Citation
If you use TokenGS in your research, please cite:
@article{tokengs2026,
title={TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens},
author={Jiawei Ren and Michal Tyszkiewicz and Jiahui Huang and Zan Gojcic},
journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}
