YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
S23DR 2026 β WireframeDETR
Submission to the S23DR 2026 Challenge.
Public test HSS: 0.575 (F1=0.664, IoU=0.516)
This project was built on Modal GPU credits left over from a previous hackathon. Each run took hours, so the budget kept experiments deliberate. Several training configurations were explored, but a full factorial study of every contribution wasn't feasible.
Approach
End-to-end 3D wireframe prediction via DETR-style set prediction over COLMAP point clouds.
Each predicted edge is a 6D coordinate pair (x1,y1,z1,x2,y2,z2) regressed by a learned query.
Hungarian matching assigns predictions to ground-truth edges at training time.
Our contributions:
- Contrastive Denoising Training (CDN) β adapted from DN-DETR; injects GT-aligned denoising queries alongside learned queries to stabilise Hungarian matching in early epochs
- Multi-scale encoder β learned softmax-weighted average of last K=3 encoder layer outputs, giving the decoder access to both fine-grained and abstract representations
- Progressive auxiliary loss weighting β decoder layer i weighted at 0.5 + 0.5Β·(i+1)/N
Model input: plain 3-channel RGB per point. No semantic feature encoding.
Adapted from jastermark/S23DR2026:
- Gestalt-guided point sampling and COLMAP projection pipeline
- Post-processing (confidence filtering, vertex merging, gap filling)
Results
| Approach | Split | F1 | IoU | HSS |
|---|---|---|---|---|
| Perceiver baseline | cleaned val | β | β | 0.350 |
| PointNet two-stage (Path B) | public test | 0.497 | 0.409 | 0.442 |
| WireframeDETR (ours) | cleaned val | 0.603 | 0.471 | 0.534 |
| WireframeDETR (ours, best) | public test | 0.664 | 0.516 | 0.575 |
Architecture
- Embedding dim: 384, Queries: 128, Encoder layers: 4, Decoder layers: 5
- ~22.7M parameters
- CDN groups: 5, Ξ»_pos=0.4, Ξ»_neg=0.8
- Training: AdamW lr=1e-4, OneCycle schedule, batch=14, 200 epochs, A100 80GB (~27h)
Checkpoint
wireframe_detr_cdn_multiscale_384d_128q.pth β plain RGB, feature_dim=3
Inference
from s23dr_2026.model import get_model, load_checkpoint_compat
from s23dr_2026.inference import predict_wireframe_v2
import torch
ckpt = torch.load("wireframe_detr_cdn_multiscale_384d_128q.pth", map_location="cpu")
model = get_model(ckpt)
load_checkpoint_compat(model, ckpt)
model.eval().to("cuda")
verts, edges = predict_wireframe_v2(scene, model, "cuda")
Training
# via Modal
modal run pipeline.py --step train --name my-run
# local
python -m s23dr_2026.train \
--name my-run \
--ply_dir /path/to/ply_data \
--embed_dim 384 --num_queries 128 \
--num_encoder_layers 4 --num_decoder_layers 5 \
--use_cdn --cdn_groups 5 \
--scheduler onecycle --num_epochs 200 \
--batch_size 14 --device cuda
Credits
- DN-DETR β contrastive denoising training
- S23DR 2026 organisers β challenge and baseline
- jastermark/S23DR2026 β COLMAP projection pipeline and post-processing
- Modal Labs β GPU compute