Venice-H1: Failure-Aware Query Re-Ranking for Referring Image Segmentation

Nicolò Savioli, Ph.D. — OdaxAI Research
nicolo.savioli@odaxai.com · odaxai.com

Architecture Overview

Venice-H1 pipeline. A frozen DeRIS backbone generates N=10 candidate masks. Multi-scale grid signatures encode spatial quality. The Failure Re-Ranker gates intervention: it only overrides Query-0 when confident the default choice is wrong.

The Failure-Case Bottleneck



7–18% of samples generate 40–68% of total error	Failure cases form a "triangle of opportunity"

Multi-Scale Grid Signatures

Compact 675-dim spatial descriptors pooled at 4×4, 8×8, 16×16 grids per candidate mask.

Multi-scale grid cells inspired by entorhinal cortex representations.

Model Description

Venice-H1 is a lightweight, backbone-decoupled re-ranking module for Referring Image Segmentation (RIS). It detects when the default query selection fails and selects a better alternative using:

Multi-Scale Grid Signatures: 4×4, 8×8, 16×16 spatial pooling → 675-dim descriptors
Failure Gate: binary classifier predicting whether Query 0 is suboptimal
Gain Predictor: IoU-regression head estimating improvement per alternative query
Backbone-decoupled: works on cached features, no retraining of DeRIS required

Architecture: 3-layer pre-norm Transformer encoder, 8 heads, hidden_dim=512
Parameters: 11,296,258 (11.3M) — matches paper exactly

Results

On failure cases (where Venice-H1 intervenes)

Positive Δ across all 8 evaluation splits.

Metric	Value
Parameters	11,296,258
Δ_fail (mIoU on failures)	+1.824
AUC (failure detection)	0.778
Δ_full (overall mIoU)	+0.039
Q0 mIoU	86.469
Selected mIoU	86.509
Oracle mIoU	89.691
Harmful-switch rate	< 0.6%

Failure Gate Analysis



ROC curves across splits. AUC 0.78–0.82	Coverage-risk trade-off at different τ

Qualitative Results

Re-ranking on RefCOCO val. Each row: input, ground truth, default query (red, fails), Venice-H1 corrected selection (blue). Venice-H1 recovers IoU > 84% in all cases.

Ablation Study

Configuration	Δ_fail	Gate AUC
BASE only (no grid)	+1.01	0.812
4×4 only	+1.01	0.821
8×8 only	+0.87	0.790
16×16 only	+1.00	0.828
BASE + all grids (ours)	+1.22	0.807

Medical Cross-Domain Transfer

Zero-shot transfer to MS-CXR (+1.16 mIoU) and M3D-RefSeg-2D (+0.51 mIoU) without fine-tuning.

External Dependencies

Component	Model	Paper
Backbone	DeRIS-L	Dai et al. (2025)
Visual Encoder	Swin-Large	Liu et al. (2021)
Language Encoder	BEiT-3	Wang et al. (2023)
Mask Generator	Mask2Former	Cheng et al. (2022)

Venice-H1 does not include these weights. You need a running DeRIS-L instance to extract features.

Quick Start

import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(repo_id="OdaxAI/venice-h1", filename="venice_h1_deris_l.pt")
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)

print("Config:", ckpt["config"])
print("Metrics:", ckpt["metrics"])
print("Parameters:", sum(v.numel() for v in ckpt["model"].values() if hasattr(v, "numel")))

Reproduce paper results (no dataset needed):

git clone https://github.com/odaxai/Venice-H1.git
cd Venice-H1 && pip install -r requirements.txt && pip install -e .
python reproduce_results.py --verify_only

── Architecture Verification ───────────────────
  Parameters : 11,296,258  ✓ MATCH

── Paper Cross-Check (RefCOCO val) ─────────────
  ✓ delta_fail : 1.8244  (paper: 1.824)
  ✓ auc_fail   : 0.7776  (paper: 0.778)
  ✓ delta_full : 0.0392  (paper: 0.039)

Files in Repository

File	Description
`venice_h1_deris_l.pt`	Trained checkpoint — 11.3M params, DeRIS-L backbone
`venice_h1_deris_l_metrics.json`	Full evaluation metrics
`config.yaml`	Training hyperparameters
`venice_h1/`	Python package (model code)
`train.py`	Training script
`evaluate.py`	Evaluation script
`reproduce_results.py`	One-command paper reproduction
`scripts/extract_features.py`	Feature extraction from DeRIS-L

Training Details

Parameter	Value
Optimizer	AdamW
Learning rate	5e-4
Weight decay	1e-4
Batch size	512
Epochs	20
Scheduler	Cosine + 3 epoch warmup
Loss: L_gate	Focal BCE (γ=2.0)
Loss: L_gain	Smooth-L1 (λ=5.0)
Mixed precision	FP16
Seed	42

Citation

@article{savioli2026veniceh1,
  title   = {Venice-H1: Failure-Aware Query Re-Ranking with Multi-Scale Grid Signatures
             for Referring Image Segmentation},
  author  = {Savioli, Nicol\`{o}},
  journal = {arXiv preprint arXiv:2606.22546},
  year    = {2026},
  note    = {OdaxAI Research},
}

License

Downloads last month: 4

Paper for OdaxAI/venice-h1

Venice-H1: Failure-Aware Query Re-Ranking with Multi-Scale Grid Signatures for Referring Image Segmentation

Paper • 2606.22546 • Published 3 days ago