latentgen-sam3D demo (Inv 013)
DF v2 prior + K-aniso FLUX decoder for object-preserving scene completion on LSUN bedrooms. Token format: per-detected-object 3D bounding box (10-d: trans+quat+aniso scale) + 8-token appearance feature (8 ร 1024-d) + scene camera (fx, fy) + background token (1024-d).
Files
| File | Size | Purpose |
|---|---|---|
prior_ckpt.pt |
~1.4 GB | DF v2 prior @ step 250K (118M params, 12L/dim768/12H) |
decoder_ckpt.pt |
~12 GB | FLUX.2-Klein-4B + K-aniso cross-attn adapters @ 170K |
demo_lmdb/ |
~10 MB | 20 hand-picked LSUN records (precomputed app_feat_k, bbox, etc.) |
demo_manifest.json |
<1 KB | image_id โ per-slot SAM3D labels |
demo_catalog.png |
~2 MB | Visual grid of the 20 candidates with labels |
Quickstart
# 1. Clone the code repo (assumes you have access)
git clone git@github.com:reve-ai/latentgen-sam3D.git
cd latentgen-sam3D
# 2. Pull this demo bundle into demo_assets/
huggingface-cli download sunovivid/latentgen-sam3d-demo \
--local-dir demo_assets
# 3. Open demo_assets/demo_catalog.png and pick an image_id
# 4. Run completion with multiple seeds
bash sh/demo.sh --image-id img_022247 --seeds 0,1,2,3 --preserve bed,table,chair
Outputs land at demo_outputs/{image_id}_seed{S}.png โ a 3-panel render of
[real input | gen +bg | gen no-bg]. Green bbox = preserved (label match), red
= sampled from prior.
Notes
- The prior is mid-training (1M-step run, currently @250K). Quality will improve with more training; this bundle is for early figure prep.
- The decoder ckpt at 170K is the latest available (146K, the original feature-space target, was pruned by 3-latest retention). Drift is small.
- For arbitrary new images outside the 20-record subset, you'd need to run
the SAM3D pipeline + decoder voxel_encoder to extract
app_feat_kfirst; seetraining_package/preprocess/extract_app_feat_dataset.py.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support