Request access to the World Tracing scene model (840x840)
These checkpoints are released for non-commercial research use under the CC BY-NC-ND 4.0 license (Attribution-NonCommercial-NoDerivatives). Please share a few details below so we can keep a light audit trail of how the weights are used in the wild. Requests are reviewed manually, typically within 1-3 business days.
Log in or Sign Up to review the conditions and access this model content.
World Tracing β Scene Model (6-layer, 840 Γ 840, r69l)
Access
The checkpoints in this repo are released under the CC BY-NC-ND 4.0 license, but downloads are gated so we can keep a light audit trail of how the model is used. To download:
- Scroll up and fill in the "Submit access request" form (basic contact info + a short note on intended use).
- We review every request manually, usually within 1-3 business days. You will receive an email from Hugging Face once your request is approved.
- After approval, log in with
huggingface-cli login(or setHF_TOKEN) and run any of the inference examples from the GitHub repo β thewtpackage picks the token up automatically and--ckpt r69ltriggers a normalhf_hub_download.
Note: this is a manual review flow, not an auto-approve click-through. We read every request individually, so please give a one-line description of what you plan to use the weights for.
EMA-only release weights for the r69l high-resolution scene model from World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible.
This is the 840 Γ 840 variant of the
scene model: same
1.5 B-parameter architecture as r69e, warm-resumed from the 504-res
checkpoint and fine-tuned at image_size=840 (60 Γ 60 patches per
layer) for noticeably sharper geometry. These are the weights that
power the Scene tab of the
interactive demo.
- Repo: https://github.com/haoz19/world-tracing
- Project page: https://haoz19.github.io/world-tracing-page/
- Config name:
r69l - Architecture:
MultilayerXYZModel, 1.5 B params (identical tor69e) - Input: 840 Γ 840 full-frame RGB (no alpha)
- Output: per-layer XYZ in camera space, 6 stacked depth maps
- Training data: Evermotion + IT-Happy indoor renders. The model is trained on indoor renders without sky β pre-mask the sky externally for outdoor inputs.
Resolution matters: this checkpoint was fine-tuned at 840 Γ 840 and must be run at that resolution (the
r69lconfig inwt/checkpoint.pyhandles this). Feeding 504-res inputs is heavily out-of-distribution and produces coarse, patch-aligned "voxel block" depth. For 504 Γ 504 inference usescene-model-6layer(r69e) instead.
Files
| File | Size | Format |
|---|---|---|
model.pt |
5.59 GB | bare state_dict, float32 |
EMA weights only β ~26 % of the original training checkpoint.
Usage
git clone https://github.com/haoz19/world-tracing
cd world-tracing
pip install -e ".[viz]"
python examples/infer_scene.py \
--image examples/test_images/scene/scene_indoor_01_modern_living_room__seed42.png \
--ckpt r69l \
--config r69l \
--out /tmp/wt_scene.rrd
Bare --ckpt r69l triggers huggingface_hub.hf_hub_download against
this repo.
Citation
@misc{zhang2026worldtracinggenerativepixelaligned,
title = {World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible},
author = {Hao Zhang and Mohamed El Banani and Jen-Hao Cheng and Paul Zhang
and Yi Hua and Ben Mildenhall and Christoph Lassner
and Narendra Ahuja and Gengshan Yang},
year = {2026},
eprint = {2606.13652},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2606.13652}
}
License
CC BY-NC-ND 4.0 (Attribution-NonCommercial-NoDerivatives) β see the GitHub repo. Non-commercial research use only.