- Venice-H1: Failure-Aware Query Re-Ranking for Referring Image Segmentation
Venice-H1: Failure-Aware Query Re-Ranking for Referring Image Segmentation
NicolΓ² Savioli, Ph.D. β OdaxAI Research
nicolo.savioli@odaxai.com Β· odaxai.com
Architecture Overview
Venice-H1 pipeline. A frozen DeRIS backbone generates N=10 candidate masks. Multi-scale grid signatures encode spatial quality. The Failure Re-Ranker gates intervention: it only overrides Query-0 when confident the default choice is wrong.
The Failure-Case Bottleneck
Multi-Scale Grid Signatures
Compact 675-dim spatial descriptors pooled at 4Γ4, 8Γ8, 16Γ16 grids per candidate mask.
Multi-scale grid cells inspired by entorhinal cortex representations.
Model Description
Venice-H1 is a lightweight, backbone-decoupled re-ranking module for Referring Image Segmentation (RIS). It detects when the default query selection fails and selects a better alternative using:
- Multi-Scale Grid Signatures: 4Γ4, 8Γ8, 16Γ16 spatial pooling β 675-dim descriptors
- Failure Gate: binary classifier predicting whether Query 0 is suboptimal
- Gain Predictor: IoU-regression head estimating improvement per alternative query
- Backbone-decoupled: works on cached features, no retraining of DeRIS required
Architecture: 3-layer pre-norm Transformer encoder, 8 heads, hidden_dim=512
Parameters: 11,296,258 (11.3M) β matches paper exactly
Results
On failure cases (where Venice-H1 intervenes)
Positive Ξ across all 8 evaluation splits.
| Metric | Value |
|---|---|
| Parameters | 11,296,258 |
| Ξ_fail (mIoU on failures) | +1.824 |
| AUC (failure detection) | 0.778 |
| Ξ_full (overall mIoU) | +0.039 |
| Q0 mIoU | 86.469 |
| Selected mIoU | 86.509 |
| Oracle mIoU | 89.691 |
| Harmful-switch rate | < 0.6% |
Failure Gate Analysis
Qualitative Results
Re-ranking on RefCOCO val. Each row: input, ground truth, default query (red, fails), Venice-H1 corrected selection (blue). Venice-H1 recovers IoU > 84% in all cases.
Ablation Study
| Configuration | Ξ_fail | Gate AUC |
|---|---|---|
| BASE only (no grid) | +1.01 | 0.812 |
| 4Γ4 only | +1.01 | 0.821 |
| 8Γ8 only | +0.87 | 0.790 |
| 16Γ16 only | +1.00 | 0.828 |
| BASE + all grids (ours) | +1.22 | 0.807 |
Medical Cross-Domain Transfer
Zero-shot transfer to MS-CXR (+1.16 mIoU) and M3D-RefSeg-2D (+0.51 mIoU) without fine-tuning.
External Dependencies
| Component | Model | Paper |
|---|---|---|
| Backbone | DeRIS-L | Dai et al. (2025) |
| Visual Encoder | Swin-Large | Liu et al. (2021) |
| Language Encoder | BEiT-3 | Wang et al. (2023) |
| Mask Generator | Mask2Former | Cheng et al. (2022) |
Venice-H1 does not include these weights. You need a running DeRIS-L instance to extract features.
Quick Start
import torch
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(repo_id="OdaxAI/venice-h1", filename="venice_h1_deris_l.pt")
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
print("Config:", ckpt["config"])
print("Metrics:", ckpt["metrics"])
print("Parameters:", sum(v.numel() for v in ckpt["model"].values() if hasattr(v, "numel")))
Reproduce paper results (no dataset needed):
git clone https://github.com/odaxai/Venice-H1.git
cd Venice-H1 && pip install -r requirements.txt && pip install -e .
python reproduce_results.py --verify_only
ββ Architecture Verification βββββββββββββββββββ
Parameters : 11,296,258 β MATCH
ββ Paper Cross-Check (RefCOCO val) βββββββββββββ
β delta_fail : 1.8244 (paper: 1.824)
β auc_fail : 0.7776 (paper: 0.778)
β delta_full : 0.0392 (paper: 0.039)
Files in Repository
| File | Description |
|---|---|
venice_h1_deris_l.pt |
Trained checkpoint β 11.3M params, DeRIS-L backbone |
venice_h1_deris_l_metrics.json |
Full evaluation metrics |
config.yaml |
Training hyperparameters |
venice_h1/ |
Python package (model code) |
train.py |
Training script |
evaluate.py |
Evaluation script |
reproduce_results.py |
One-command paper reproduction |
scripts/extract_features.py |
Feature extraction from DeRIS-L |
Training Details
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning rate | 5e-4 |
| Weight decay | 1e-4 |
| Batch size | 512 |
| Epochs | 20 |
| Scheduler | Cosine + 3 epoch warmup |
| Loss: L_gate | Focal BCE (Ξ³=2.0) |
| Loss: L_gain | Smooth-L1 (Ξ»=5.0) |
| Mixed precision | FP16 |
| Seed | 42 |
Citation
@article{savioli2026veniceh1,
title = {Venice-H1: Failure-Aware Query Re-Ranking with Multi-Scale Grid Signatures
for Referring Image Segmentation},
author = {Savioli, Nicol\`{o}},
journal = {arXiv preprint arXiv:2606.22546},
year = {2026},
note = {OdaxAI Research},
}
License
MIT License. Β© 2026 OdaxAI Research. All research conducted by NicolΓ² Savioli, Ph.D.
- Downloads last month
- 4










