Venice-H1: Failure-Aware Query Re-Ranking for Referring Image Segmentation

Paper GitHub License: MIT

NicolΓ² Savioli, Ph.D. β€” OdaxAI Research
nicolo.savioli@odaxai.com Β· odaxai.com


Architecture Overview

Architecture Overview

Venice-H1 pipeline. A frozen DeRIS backbone generates N=10 candidate masks. Multi-scale grid signatures encode spatial quality. The Failure Re-Ranker gates intervention: it only overrides Query-0 when confident the default choice is wrong.


The Failure-Case Bottleneck

Error Budget IoU Scatter
7–18% of samples generate 40–68% of total error Failure cases form a "triangle of opportunity"

Multi-Scale Grid Signatures

Grid Signatures

Compact 675-dim spatial descriptors pooled at 4Γ—4, 8Γ—8, 16Γ—16 grids per candidate mask.

Grid Cells

Multi-scale grid cells inspired by entorhinal cortex representations.


Model Description

Venice-H1 is a lightweight, backbone-decoupled re-ranking module for Referring Image Segmentation (RIS). It detects when the default query selection fails and selects a better alternative using:

  • Multi-Scale Grid Signatures: 4Γ—4, 8Γ—8, 16Γ—16 spatial pooling β†’ 675-dim descriptors
  • Failure Gate: binary classifier predicting whether Query 0 is suboptimal
  • Gain Predictor: IoU-regression head estimating improvement per alternative query
  • Backbone-decoupled: works on cached features, no retraining of DeRIS required

Architecture: 3-layer pre-norm Transformer encoder, 8 heads, hidden_dim=512
Parameters: 11,296,258 (11.3M) β€” matches paper exactly


Results

On failure cases (where Venice-H1 intervenes)

Per Split Improvement

Positive Ξ” across all 8 evaluation splits.

Metric Value
Parameters 11,296,258
Ξ”_fail (mIoU on failures) +1.824
AUC (failure detection) 0.778
Ξ”_full (overall mIoU) +0.039
Q0 mIoU 86.469
Selected mIoU 86.509
Oracle mIoU 89.691
Harmful-switch rate < 0.6%

Failure Gate Analysis

ROC Curves Coverage Risk
ROC curves across splits. AUC 0.78–0.82 Coverage-risk trade-off at different Ο„

Qualitative Results

Qualitative Examples

Re-ranking on RefCOCO val. Each row: input, ground truth, default query (red, fails), Venice-H1 corrected selection (blue). Venice-H1 recovers IoU > 84% in all cases.


Ablation Study

Ablation Study

Configuration Ξ”_fail Gate AUC
BASE only (no grid) +1.01 0.812
4Γ—4 only +1.01 0.821
8Γ—8 only +0.87 0.790
16Γ—16 only +1.00 0.828
BASE + all grids (ours) +1.22 0.807

Medical Cross-Domain Transfer

Medical Transfer

Zero-shot transfer to MS-CXR (+1.16 mIoU) and M3D-RefSeg-2D (+0.51 mIoU) without fine-tuning.


External Dependencies

Component Model Paper
Backbone DeRIS-L Dai et al. (2025)
Visual Encoder Swin-Large Liu et al. (2021)
Language Encoder BEiT-3 Wang et al. (2023)
Mask Generator Mask2Former Cheng et al. (2022)

Venice-H1 does not include these weights. You need a running DeRIS-L instance to extract features.


Quick Start

import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(repo_id="OdaxAI/venice-h1", filename="venice_h1_deris_l.pt")
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)

print("Config:", ckpt["config"])
print("Metrics:", ckpt["metrics"])
print("Parameters:", sum(v.numel() for v in ckpt["model"].values() if hasattr(v, "numel")))

Reproduce paper results (no dataset needed):

git clone https://github.com/odaxai/Venice-H1.git
cd Venice-H1 && pip install -r requirements.txt && pip install -e .
python reproduce_results.py --verify_only
── Architecture Verification ───────────────────
  Parameters : 11,296,258  βœ“ MATCH

── Paper Cross-Check (RefCOCO val) ─────────────
  βœ“ delta_fail : 1.8244  (paper: 1.824)
  βœ“ auc_fail   : 0.7776  (paper: 0.778)
  βœ“ delta_full : 0.0392  (paper: 0.039)

Files in Repository

File Description
venice_h1_deris_l.pt Trained checkpoint β€” 11.3M params, DeRIS-L backbone
venice_h1_deris_l_metrics.json Full evaluation metrics
config.yaml Training hyperparameters
venice_h1/ Python package (model code)
train.py Training script
evaluate.py Evaluation script
reproduce_results.py One-command paper reproduction
scripts/extract_features.py Feature extraction from DeRIS-L

Training Details

Parameter Value
Optimizer AdamW
Learning rate 5e-4
Weight decay 1e-4
Batch size 512
Epochs 20
Scheduler Cosine + 3 epoch warmup
Loss: L_gate Focal BCE (Ξ³=2.0)
Loss: L_gain Smooth-L1 (Ξ»=5.0)
Mixed precision FP16
Seed 42

Citation

@article{savioli2026veniceh1,
  title   = {Venice-H1: Failure-Aware Query Re-Ranking with Multi-Scale Grid Signatures
             for Referring Image Segmentation},
  author  = {Savioli, Nicol\`{o}},
  journal = {arXiv preprint arXiv:2606.22546},
  year    = {2026},
  note    = {OdaxAI Research},
}

License

MIT License. Β© 2026 OdaxAI Research. All research conducted by NicolΓ² Savioli, Ph.D.

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for OdaxAI/venice-h1