You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SegTME-UNI2-UperHoVer β€” Stage 1 (PanNuke)

Stage 1 of 3 in the SegTME-UNI2 progressive pseudo-label curriculum. Trained on the full PanNuke pan-cancer nucleus dataset with human ground-truth labels. This model serves as the supervised seed that generates pseudo-labels for Stage 2.

Part of the SegTME-UNI2 framework β€” Segmentation of Tumour Microenvironment with UNI2, targeting end-to-end cell segmentation, TME feature extraction, and clinical narrative generation from routine H&E histology. Submitted to Computers in Biology and Medicine.


Framework Overview

SegTME-UNI2 addresses the core bottleneck in computational TME analysis: the gap between the scale of available TCGA H&E image data (1,608,060 patches) and the scale of human-annotated pixel-level labels (7,901 PanNuke images, 189,744 annotated nuclei). The solution is a three-stage progressive pseudo-label curriculum that closes this gap without additional manual annotation, staging domain expansion across resolution scales.

Stage 1: PanNuke (7,901 patches, human GT) β†’ M1  ← this model
     ↓ M1 infers on TCGA-UT Scale 0 β†’ entropy-filtered pseudo-labels
Stage 2: TCGA-UT Scale 0 (271,711 patches, pseudo-labels) β†’ M2
     ↓ M2 infers on TCGA-UT Scales 0–5 β†’ entropy-filtered pseudo-labels
Stage 3: TCGA-UT Scales 0–5 (1,608,061 patches, pseudo-labels) β†’ M3

Critical design principle: Each stage trains a completely independent model from scratch (UNI2-h pretrained backbone + randomly initialised decoders). No weights are transferred between stages β€” improvement is driven entirely by increasing pseudo-label quality.


Architecture β€” UNI2-UperHoVer

A novel dual-head segmentation model:

Input (3 Γ— 224 Γ— 224)
     ↓
UNI2-h ViT-Giant backbone (pretrained on 100M+ histopathology tiles from 100,000 slides)
     ↓ multi-scale FPN taps at transformer blocks 5, 11, 17, 23
1Γ—1 projection layers β†’ feature pyramid at 4 scales
     ↓
UperNet Decoder (PPM + FPN fusion)
     β”œβ”€β†’ Semantic Head (6-class)       β†’ semantic segmentation map
     └─→ HV Regression Head (2-channel) β†’ horizontal/vertical gradient maps
                                           β†’ watershed-based nuclear instance separation

UNI2-h backbone specs:

  • Architecture: ViT-Giant (patch size 14 px, embedding dim d = 1536, 24 transformer blocks, 24 attention heads, SwiGLU-packed MLP, 8 register tokens)
  • Pretrained on: >100 million tiles from 100,000 whole-slide images, 40+ cancer types
  • Feature pyramid: blocks {5, 11, 17, 23} β†’ channels {256, 512, 1024, 2048} at strides {s/4, s/8, s/16, s/32}

Dual decoder heads (parameters not shared between heads):

  • Semantic head: num_labels=6, hidden_size=768 β†’ 6-class per-pixel output
  • HV regression head: num_labels=2, hidden_size=768 β†’ horizontal (ch 0) and vertical (ch 1) gradient maps

Loss function:

L_total = L_sem + Ξ» Β· L_hv

L_sem  = cross-entropy over valid pixels (ignore_index=255)
L_hv   = L_MSE + 2 Β· L_MSGE   (foreground pixels only)
Ξ»      = 1.0

Dynamic HV target synthesis: HV maps are generated on-the-fly from semantic labels using connected-component labelling and per-nucleus centroid computation β€” no instance-level annotations are required at any stage.


Cell Classes (PanNuke Ontology)

Class Label Biological meaning
0 Background Non-cellular tissue / void
1 Neoplastic Tumour nuclei
2 Inflammatory Immune cells (lymphocytes, neutrophils, etc.)
3 Connective Stromal / connective tissue nuclei
4 Dead Necrotic / apoptotic nuclei
5 Non-neoplastic Epithelial Normal epithelial nuclei

Training Configuration β€” Stage 1

Hyperparameter Value
Training dataset PanNuke β€” 7,901 patches, 19 tissue types, 189,744 annotated nuclei
Training split 80% train (6,321 patches), 20% held-out val (1,580 patches)
Input resolution 256Γ—256 px at 0.25 Β΅m/px β†’ resized to 224Γ—224 for backbone
Backbone UNI2-h (frozen-then-fine-tuned)
Optimiser AdamW (β₁=0.9, Ξ²β‚‚=0.999, weight_decay=1Γ—10⁻²)
Learning rate 5Γ—10⁻⁡ β€” linear decay: LR(t) = 5Γ—10⁻⁡ Γ— (Tβˆ’t)/T (no warmup)
Per-device batch size 8
Gradient accumulation 1 step (effective batch = 8 per GPU)
Number of GPUs 8 Γ— NVIDIA A100 (DDP)
Training epochs 249
Total optimizer steps 24,651
Mixed precision bfloat16
Compilation torch.compile (inductor backend)
Eval frequency every 500 steps
Checkpoint metric Validation mean IoU (↑)

Augmentation (each with p=0.5): colour jitter (brightness/contrast/saturation ±20%, hue ±5%), HLS-space multiplicative perturbation ∈[0.9, 1.1], horizontal flip, vertical flip.


Results

Checkpoint Epoch Step Val mean IoU Eval loss
1 99 0.408 0.391
25 2,475 0.784 0.072
50 4,950 0.808 0.058
100 9,900 0.861 0.041
150 14,850 0.893 0.033
200 19,800 0.917 0.027
best 249 24,651 0.9313 0.025

Evaluation protocol: mIoU is macro-averaged Jaccard across all 6 classes on the PanNuke 20% held-out split (human ground-truth labels). Void pixels (label=255) are excluded. Best checkpoint is the final checkpoint β€” the model was still improving at end of training.


Inference

# MPP normalisation before inference
scale = mpp_input / 0.314   # target ~0.35 Β΅m/px

# Tiling
# Tile 224Γ—224 with 50% overlap (stride 112 px); zero-pad to multiple of 14
# Stitch semantic map (foreground-priority) and HV map (last-write)

# Watershed instance separation
# Energy = HV magnitude; EDT seeds; compactness=0.01

The model expects 224Γ—224 RGB patches normalised with ImageNet statistics (ΞΌ=(0.485, 0.456, 0.406), Οƒ=(0.229, 0.224, 0.225)).


Repository Contents

File Description
model.safetensors Model weights β€” UNI2-h backbone + both decoder heads (~3.2 GB)
trainer_state.json Full training log: per-step mIoU, loss, learning rate
training_args.bin HuggingFace Trainer configuration

Related Models in This Series

Model Stage Training data Best val mIoU Eval basis
This model 1 PanNuke β€” 7,901 patches (GT) 0.9313 Human GT
TCGA-UT-0 2 TCGA-UT Scale 0 β€” 271K patches (PL) 0.8197 PL-val†
TCGA-UT-012345 3 TCGA-UT Scales 0–5 β€” 1.6M patches (PL) 0.7724 PL-val†

†PL-val = pseudo-label validation (self-consistency on model-generated labels, not human GT). Cross-domain evaluation on PanNuke GT for M2/M3 is pending (requires dedicated inference run).


Data Availability


Citation

This model is part of the SegTME-UNI2 framework (manuscript under review, Computers in Biology and Medicine). Please also cite the UNI2 foundation model and PanNuke dataset:

@article{chen2024uni2,
  title={Towards a General-Purpose Foundation Model for Computational Pathology},
  author={Chen, Richard J and others},
  journal={Nature Medicine},
  year={2024}
}
@article{gamper2020pannuke,
  title={PanNuke Dataset Extension, Insights and Baselines},
  author={Gamper, Jevgenij and others},
  year={2020}
}
Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train mizjaggy18/SegTME-UNI2-UperHoVer_PanNuke