S23DR 2026 — WireframeDETR

Submission to the S23DR 2026 Challenge.

Public test HSS: 0.575 (F1=0.664, IoU=0.516)

This project was built on Modal GPU credits left over from a previous hackathon. Each run took hours, so the budget kept experiments deliberate. Several training configurations were explored, but a full factorial study of every contribution wasn't feasible.

Approach

End-to-end 3D wireframe prediction via DETR-style set prediction over COLMAP point clouds. Each predicted edge is a 6D coordinate pair (x1,y1,z1,x2,y2,z2) regressed by a learned query. Hungarian matching assigns predictions to ground-truth edges at training time.

Our contributions:

Contrastive Denoising Training (CDN) — adapted from DN-DETR; injects GT-aligned denoising queries alongside learned queries to stabilise Hungarian matching in early epochs
Multi-scale encoder — learned softmax-weighted average of last K=3 encoder layer outputs, giving the decoder access to both fine-grained and abstract representations
Progressive auxiliary loss weighting — decoder layer i weighted at 0.5 + 0.5·(i+1)/N

Model input: plain 3-channel RGB per point. No semantic feature encoding.

Adapted from jastermark/S23DR2026:

Gestalt-guided point sampling and COLMAP projection pipeline
Post-processing (confidence filtering, vertex merging, gap filling)

Results

Approach	Split	F1	IoU	HSS
Perceiver baseline	cleaned val	—	—	0.350
PointNet two-stage (Path B)	public test	0.497	0.409	0.442
WireframeDETR (ours)	cleaned val	0.603	0.471	0.534
WireframeDETR (ours, best)	public test	0.664	0.516	0.575

Architecture

Embedding dim: 384, Queries: 128, Encoder layers: 4, Decoder layers: 5
~22.7M parameters
CDN groups: 5, λ_pos=0.4, λ_neg=0.8
Training: AdamW lr=1e-4, OneCycle schedule, batch=14, 200 epochs, A100 80GB (~27h)

Checkpoint

wireframe_detr_cdn_multiscale_384d_128q.pth — plain RGB, feature_dim=3

Inference

from s23dr_2026.model import get_model, load_checkpoint_compat
from s23dr_2026.inference import predict_wireframe_v2

import torch
ckpt = torch.load("wireframe_detr_cdn_multiscale_384d_128q.pth", map_location="cpu")
model = get_model(ckpt)
load_checkpoint_compat(model, ckpt)
model.eval().to("cuda")

verts, edges = predict_wireframe_v2(scene, model, "cuda")

Training

# via Modal
modal run pipeline.py --step train --name my-run

# local
python -m s23dr_2026.train \
  --name my-run \
  --ply_dir /path/to/ply_data \
  --embed_dim 384 --num_queries 128 \
  --num_encoder_layers 4 --num_decoder_layers 5 \
  --use_cdn --cdn_groups 5 \
  --scheduler onecycle --num_epochs 200 \
  --batch_size 14 --device cuda

Credits

DN-DETR — contrastive denoising training
S23DR 2026 organisers — challenge and baseline
jastermark/S23DR2026 — COLMAP projection pipeline and post-processing
Modal Labs — GPU compute

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for StarAtNyte1/s23dr-2026-submission

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Paper • 2203.01305 • Published Mar 2, 2022