YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

S23DR 2026 β€” WireframeDETR

Submission to the S23DR 2026 Challenge.

Public test HSS: 0.575 (F1=0.664, IoU=0.516)

This project was built on Modal GPU credits left over from a previous hackathon. Each run took hours, so the budget kept experiments deliberate. Several training configurations were explored, but a full factorial study of every contribution wasn't feasible.

Approach

End-to-end 3D wireframe prediction via DETR-style set prediction over COLMAP point clouds. Each predicted edge is a 6D coordinate pair (x1,y1,z1,x2,y2,z2) regressed by a learned query. Hungarian matching assigns predictions to ground-truth edges at training time.

Our contributions:

  • Contrastive Denoising Training (CDN) β€” adapted from DN-DETR; injects GT-aligned denoising queries alongside learned queries to stabilise Hungarian matching in early epochs
  • Multi-scale encoder β€” learned softmax-weighted average of last K=3 encoder layer outputs, giving the decoder access to both fine-grained and abstract representations
  • Progressive auxiliary loss weighting β€” decoder layer i weighted at 0.5 + 0.5Β·(i+1)/N

Model input: plain 3-channel RGB per point. No semantic feature encoding.

Adapted from jastermark/S23DR2026:

  • Gestalt-guided point sampling and COLMAP projection pipeline
  • Post-processing (confidence filtering, vertex merging, gap filling)

Results

Approach Split F1 IoU HSS
Perceiver baseline cleaned val β€” β€” 0.350
PointNet two-stage (Path B) public test 0.497 0.409 0.442
WireframeDETR (ours) cleaned val 0.603 0.471 0.534
WireframeDETR (ours, best) public test 0.664 0.516 0.575

Architecture

  • Embedding dim: 384, Queries: 128, Encoder layers: 4, Decoder layers: 5
  • ~22.7M parameters
  • CDN groups: 5, Ξ»_pos=0.4, Ξ»_neg=0.8
  • Training: AdamW lr=1e-4, OneCycle schedule, batch=14, 200 epochs, A100 80GB (~27h)

Checkpoint

wireframe_detr_cdn_multiscale_384d_128q.pth β€” plain RGB, feature_dim=3

Inference

from s23dr_2026.model import get_model, load_checkpoint_compat
from s23dr_2026.inference import predict_wireframe_v2

import torch
ckpt = torch.load("wireframe_detr_cdn_multiscale_384d_128q.pth", map_location="cpu")
model = get_model(ckpt)
load_checkpoint_compat(model, ckpt)
model.eval().to("cuda")

verts, edges = predict_wireframe_v2(scene, model, "cuda")

Training

# via Modal
modal run pipeline.py --step train --name my-run

# local
python -m s23dr_2026.train \
  --name my-run \
  --ply_dir /path/to/ply_data \
  --embed_dim 384 --num_queries 128 \
  --num_encoder_layers 4 --num_decoder_layers 5 \
  --use_cdn --cdn_groups 5 \
  --scheduler onecycle --num_epochs 200 \
  --batch_size 14 --device cuda

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for StarAtNyte1/s23dr-2026-submission