Request access to the World Tracing scene model (840x840)

These checkpoints are released for non-commercial research use under the CC BY-NC-ND 4.0 license (Attribution-NonCommercial-NoDerivatives). Please share a few details below so we can keep a light audit trail of how the weights are used in the wild. Requests are reviewed manually, typically within 1-3 business days.

World Tracing — Scene Model (6-layer, 840 × 840, r69l)

Access

The checkpoints in this repo are released under the CC BY-NC-ND 4.0 license, but downloads are gated so we can keep a light audit trail of how the model is used. To download:

Scroll up and fill in the "Submit access request" form (basic contact info + a short note on intended use).
We review every request manually, usually within 1-3 business days. You will receive an email from Hugging Face once your request is approved.
After approval, log in with huggingface-cli login (or set HF_TOKEN) and run any of the inference examples from the GitHub repo — the wt package picks the token up automatically and --ckpt r69l triggers a normal hf_hub_download.

Note: this is a manual review flow, not an auto-approve click-through. We read every request individually, so please give a one-line description of what you plan to use the weights for.

EMA-only release weights for the r69l high-resolution scene model from World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible.

This is the 840 × 840 variant of the scene model: same 1.5 B-parameter architecture as r69e, warm-resumed from the 504-res checkpoint and fine-tuned at image_size=840 (60 × 60 patches per layer) for noticeably sharper geometry. These are the weights that power the Scene tab of the interactive demo.

Repo: https://github.com/haoz19/world-tracing
Project page: https://haoz19.github.io/world-tracing-page/
Config name: r69l
Architecture: MultilayerXYZModel, 1.5 B params (identical to r69e)
Input: 840 × 840 full-frame RGB (no alpha)
Output: per-layer XYZ in camera space, 6 stacked depth maps
Training data: Evermotion + IT-Happy indoor renders. The model is trained on indoor renders without sky — pre-mask the sky externally for outdoor inputs.

Resolution matters: this checkpoint was fine-tuned at 840 × 840 and must be run at that resolution (the r69l config in wt/checkpoint.py handles this). Feeding 504-res inputs is heavily out-of-distribution and produces coarse, patch-aligned "voxel block" depth. For 504 × 504 inference use scene-model-6layer (r69e) instead.

Files

File	Size	Format
`model.pt`	5.59 GB	bare `state_dict`, float32

EMA weights only — ~26 % of the original training checkpoint.

Usage

git clone https://github.com/haoz19/world-tracing
cd world-tracing
pip install -e ".[viz]"

python examples/infer_scene.py \
    --image  examples/test_images/scene/scene_indoor_01_modern_living_room__seed42.png \
    --ckpt   r69l \
    --config r69l \
    --out    /tmp/wt_scene.rrd

Bare --ckpt r69l triggers huggingface_hub.hf_hub_download against this repo.

Citation

@misc{zhang2026worldtracinggenerativepixelaligned,
  title         = {World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible},
  author        = {Hao Zhang and Mohamed El Banani and Jen-Hao Cheng and Paul Zhang
                   and Yi Hua and Ben Mildenhall and Christoph Lassner
                   and Narendra Ahuja and Gengshan Yang},
  year          = {2026},
  eprint        = {2606.13652},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2606.13652}
}

License

CC BY-NC-ND 4.0 (Attribution-NonCommercial-NoDerivatives) — see the GitHub repo. Non-commercial research use only.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for haoz19/scene-model-6layer-840

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Paper • 2606.13652 • Published 2 days ago