SegTME-UNI2-UperHoVer β Stage 1 (PanNuke)
Stage 1 of 3 in the SegTME-UNI2 progressive pseudo-label curriculum. Trained on the full PanNuke pan-cancer nucleus dataset with human ground-truth labels. This model serves as the supervised seed that generates pseudo-labels for Stage 2.
Part of the SegTME-UNI2 framework β Segmentation of Tumour Microenvironment with UNI2, targeting end-to-end cell segmentation, TME feature extraction, and clinical narrative generation from routine H&E histology. Submitted to Computers in Biology and Medicine.
Framework Overview
SegTME-UNI2 addresses the core bottleneck in computational TME analysis: the gap between the scale of available TCGA H&E image data (1,608,060 patches) and the scale of human-annotated pixel-level labels (7,901 PanNuke images, 189,744 annotated nuclei). The solution is a three-stage progressive pseudo-label curriculum that closes this gap without additional manual annotation, staging domain expansion across resolution scales.
Stage 1: PanNuke (7,901 patches, human GT) β M1 β this model
β M1 infers on TCGA-UT Scale 0 β entropy-filtered pseudo-labels
Stage 2: TCGA-UT Scale 0 (271,711 patches, pseudo-labels) β M2
β M2 infers on TCGA-UT Scales 0β5 β entropy-filtered pseudo-labels
Stage 3: TCGA-UT Scales 0β5 (1,608,061 patches, pseudo-labels) β M3
Critical design principle: Each stage trains a completely independent model from scratch (UNI2-h pretrained backbone + randomly initialised decoders). No weights are transferred between stages β improvement is driven entirely by increasing pseudo-label quality.
Architecture β UNI2-UperHoVer
A novel dual-head segmentation model:
Input (3 Γ 224 Γ 224)
β
UNI2-h ViT-Giant backbone (pretrained on 100M+ histopathology tiles from 100,000 slides)
β multi-scale FPN taps at transformer blocks 5, 11, 17, 23
1Γ1 projection layers β feature pyramid at 4 scales
β
UperNet Decoder (PPM + FPN fusion)
βββ Semantic Head (6-class) β semantic segmentation map
βββ HV Regression Head (2-channel) β horizontal/vertical gradient maps
β watershed-based nuclear instance separation
UNI2-h backbone specs:
- Architecture: ViT-Giant (patch size 14 px, embedding dim d = 1536, 24 transformer blocks, 24 attention heads, SwiGLU-packed MLP, 8 register tokens)
- Pretrained on: >100 million tiles from 100,000 whole-slide images, 40+ cancer types
- Feature pyramid: blocks {5, 11, 17, 23} β channels {256, 512, 1024, 2048} at strides {s/4, s/8, s/16, s/32}
Dual decoder heads (parameters not shared between heads):
- Semantic head:
num_labels=6,hidden_size=768β 6-class per-pixel output - HV regression head:
num_labels=2,hidden_size=768β horizontal (ch 0) and vertical (ch 1) gradient maps
Loss function:
L_total = L_sem + Ξ» Β· L_hv
L_sem = cross-entropy over valid pixels (ignore_index=255)
L_hv = L_MSE + 2 Β· L_MSGE (foreground pixels only)
Ξ» = 1.0
Dynamic HV target synthesis: HV maps are generated on-the-fly from semantic labels using connected-component labelling and per-nucleus centroid computation β no instance-level annotations are required at any stage.
Cell Classes (PanNuke Ontology)
| Class | Label | Biological meaning |
|---|---|---|
| 0 | Background | Non-cellular tissue / void |
| 1 | Neoplastic | Tumour nuclei |
| 2 | Inflammatory | Immune cells (lymphocytes, neutrophils, etc.) |
| 3 | Connective | Stromal / connective tissue nuclei |
| 4 | Dead | Necrotic / apoptotic nuclei |
| 5 | Non-neoplastic Epithelial | Normal epithelial nuclei |
Training Configuration β Stage 1
| Hyperparameter | Value |
|---|---|
| Training dataset | PanNuke β 7,901 patches, 19 tissue types, 189,744 annotated nuclei |
| Training split | 80% train (6,321 patches), 20% held-out val (1,580 patches) |
| Input resolution | 256Γ256 px at 0.25 Β΅m/px β resized to 224Γ224 for backbone |
| Backbone | UNI2-h (frozen-then-fine-tuned) |
| Optimiser | AdamW (Ξ²β=0.9, Ξ²β=0.999, weight_decay=1Γ10β»Β²) |
| Learning rate | 5Γ10β»β΅ β linear decay: LR(t) = 5Γ10β»β΅ Γ (Tβt)/T (no warmup) |
| Per-device batch size | 8 |
| Gradient accumulation | 1 step (effective batch = 8 per GPU) |
| Number of GPUs | 8 Γ NVIDIA A100 (DDP) |
| Training epochs | 249 |
| Total optimizer steps | 24,651 |
| Mixed precision | bfloat16 |
| Compilation | torch.compile (inductor backend) |
| Eval frequency | every 500 steps |
| Checkpoint metric | Validation mean IoU (β) |
Augmentation (each with p=0.5): colour jitter (brightness/contrast/saturation Β±20%, hue Β±5%), HLS-space multiplicative perturbation β[0.9, 1.1], horizontal flip, vertical flip.
Results
| Checkpoint | Epoch | Step | Val mean IoU | Eval loss |
|---|---|---|---|---|
| 1 | 99 | 0.408 | 0.391 | |
| 25 | 2,475 | 0.784 | 0.072 | |
| 50 | 4,950 | 0.808 | 0.058 | |
| 100 | 9,900 | 0.861 | 0.041 | |
| 150 | 14,850 | 0.893 | 0.033 | |
| 200 | 19,800 | 0.917 | 0.027 | |
| best | 249 | 24,651 | 0.9313 | 0.025 |
Evaluation protocol: mIoU is macro-averaged Jaccard across all 6 classes on the PanNuke 20% held-out split (human ground-truth labels). Void pixels (label=255) are excluded. Best checkpoint is the final checkpoint β the model was still improving at end of training.
Inference
# MPP normalisation before inference
scale = mpp_input / 0.314 # target ~0.35 Β΅m/px
# Tiling
# Tile 224Γ224 with 50% overlap (stride 112 px); zero-pad to multiple of 14
# Stitch semantic map (foreground-priority) and HV map (last-write)
# Watershed instance separation
# Energy = HV magnitude; EDT seeds; compactness=0.01
The model expects 224Γ224 RGB patches normalised with ImageNet statistics (ΞΌ=(0.485, 0.456, 0.406), Ο=(0.229, 0.224, 0.225)).
Repository Contents
| File | Description |
|---|---|
model.safetensors |
Model weights β UNI2-h backbone + both decoder heads (~3.2 GB) |
trainer_state.json |
Full training log: per-step mIoU, loss, learning rate |
training_args.bin |
HuggingFace Trainer configuration |
Related Models in This Series
| Model | Stage | Training data | Best val mIoU | Eval basis |
|---|---|---|---|---|
| This model | 1 | PanNuke β 7,901 patches (GT) | 0.9313 | Human GT |
| TCGA-UT-0 | 2 | TCGA-UT Scale 0 β 271K patches (PL) | 0.8197 | PL-valβ |
| TCGA-UT-012345 | 3 | TCGA-UT Scales 0β5 β 1.6M patches (PL) | 0.7724 | PL-valβ |
β PL-val = pseudo-label validation (self-consistency on model-generated labels, not human GT). Cross-domain evaluation on PanNuke GT for M2/M3 is pending (requires dedicated inference run).
Data Availability
- PanNuke dataset: https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke
- UNI2-h backbone: https://huggingface.co/MahmoodLab/uni2-h
- Pseudo-labelled TCGA-UT dataset: https://huggingface.co/datasets/mizjaggy18/tcga-ut-cell-instance-semantic
- Training code: to be released upon paper acceptance
Citation
This model is part of the SegTME-UNI2 framework (manuscript under review, Computers in Biology and Medicine). Please also cite the UNI2 foundation model and PanNuke dataset:
@article{chen2024uni2,
title={Towards a General-Purpose Foundation Model for Computational Pathology},
author={Chen, Richard J and others},
journal={Nature Medicine},
year={2024}
}
@article{gamper2020pannuke,
title={PanNuke Dataset Extension, Insights and Baselines},
author={Gamper, Jevgenij and others},
year={2020}
}
- Downloads last month
- -