You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SegTME-UNI2-UperHoVer — Stage 1 (PanNuke)

Stage 1 of 3 in the SegTME-UNI2 progressive pseudo-label curriculum. Trained on the full PanNuke pan-cancer nucleus dataset with human ground-truth labels. This model serves as the supervised seed that generates pseudo-labels for Stage 2.

Part of the SegTME-UNI2 framework — Segmentation of Tumour Microenvironment with UNI2, targeting end-to-end cell segmentation, TME feature extraction, and clinical narrative generation from routine H&E histology. Submitted to Computers in Biology and Medicine.

Framework Overview

SegTME-UNI2 addresses the core bottleneck in computational TME analysis: the gap between the scale of available TCGA H&E image data (1,608,060 patches) and the scale of human-annotated pixel-level labels (7,901 PanNuke images, 189,744 annotated nuclei). The solution is a three-stage progressive pseudo-label curriculum that closes this gap without additional manual annotation, staging domain expansion across resolution scales.

Stage 1: PanNuke (7,901 patches, human GT) → M1  ← this model
     ↓ M1 infers on TCGA-UT Scale 0 → entropy-filtered pseudo-labels
Stage 2: TCGA-UT Scale 0 (271,711 patches, pseudo-labels) → M2
     ↓ M2 infers on TCGA-UT Scales 0–5 → entropy-filtered pseudo-labels
Stage 3: TCGA-UT Scales 0–5 (1,608,061 patches, pseudo-labels) → M3

Critical design principle: Each stage trains a completely independent model from scratch (UNI2-h pretrained backbone + randomly initialised decoders). No weights are transferred between stages — improvement is driven entirely by increasing pseudo-label quality.

Architecture — UNI2-UperHoVer

A novel dual-head segmentation model:

Input (3 × 224 × 224)
     ↓
UNI2-h ViT-Giant backbone (pretrained on 100M+ histopathology tiles from 100,000 slides)
     ↓ multi-scale FPN taps at transformer blocks 5, 11, 17, 23
1×1 projection layers → feature pyramid at 4 scales
     ↓
UperNet Decoder (PPM + FPN fusion)
     ├─→ Semantic Head (6-class)       → semantic segmentation map
     └─→ HV Regression Head (2-channel) → horizontal/vertical gradient maps
                                           → watershed-based nuclear instance separation

UNI2-h backbone specs:

Architecture: ViT-Giant (patch size 14 px, embedding dim d = 1536, 24 transformer blocks, 24 attention heads, SwiGLU-packed MLP, 8 register tokens)
Pretrained on: >100 million tiles from 100,000 whole-slide images, 40+ cancer types
Feature pyramid: blocks {5, 11, 17, 23} → channels {256, 512, 1024, 2048} at strides {s/4, s/8, s/16, s/32}

Dual decoder heads (parameters not shared between heads):

Semantic head: num_labels=6, hidden_size=768 → 6-class per-pixel output
HV regression head: num_labels=2, hidden_size=768 → horizontal (ch 0) and vertical (ch 1) gradient maps

Loss function:

L_total = L_sem + λ · L_hv

L_sem  = cross-entropy over valid pixels (ignore_index=255)
L_hv   = L_MSE + 2 · L_MSGE   (foreground pixels only)
λ      = 1.0

Dynamic HV target synthesis: HV maps are generated on-the-fly from semantic labels using connected-component labelling and per-nucleus centroid computation — no instance-level annotations are required at any stage.

Cell Classes (PanNuke Ontology)

Class	Label	Biological meaning
0	Background	Non-cellular tissue / void
1	Neoplastic	Tumour nuclei
2	Inflammatory	Immune cells (lymphocytes, neutrophils, etc.)
3	Connective	Stromal / connective tissue nuclei
4	Dead	Necrotic / apoptotic nuclei
5	Non-neoplastic Epithelial	Normal epithelial nuclei

Training Configuration — Stage 1

Hyperparameter	Value
Training dataset	PanNuke — 7,901 patches, 19 tissue types, 189,744 annotated nuclei
Training split	80% train (6,321 patches), 20% held-out val (1,580 patches)
Input resolution	256×256 px at 0.25 µm/px → resized to 224×224 for backbone
Backbone	UNI2-h (frozen-then-fine-tuned)
Optimiser	AdamW (β₁=0.9, β₂=0.999, weight_decay=1×10⁻²)
Learning rate	5×10⁻⁵ — linear decay: LR(t) = 5×10⁻⁵ × (T−t)/T (no warmup)
Per-device batch size	8
Gradient accumulation	1 step (effective batch = 8 per GPU)
Number of GPUs	8 × NVIDIA A100 (DDP)
Training epochs	249
Total optimizer steps	24,651
Mixed precision	bfloat16
Compilation	`torch.compile` (inductor backend)
Eval frequency	every 500 steps
Checkpoint metric	Validation mean IoU (↑)

Augmentation (each with p=0.5): colour jitter (brightness/contrast/saturation ±20%, hue ±5%), HLS-space multiplicative perturbation ∈[0.9, 1.1], horizontal flip, vertical flip.

Results

Checkpoint	Epoch	Step	Val mean IoU	Eval loss
	1	99	0.408	0.391
	25	2,475	0.784	0.072
	50	4,950	0.808	0.058
	100	9,900	0.861	0.041
	150	14,850	0.893	0.033
	200	19,800	0.917	0.027
best	249	24,651	0.9313	0.025

Evaluation protocol: mIoU is macro-averaged Jaccard across all 6 classes on the PanNuke 20% held-out split (human ground-truth labels). Void pixels (label=255) are excluded. Best checkpoint is the final checkpoint — the model was still improving at end of training.

Inference

# MPP normalisation before inference
scale = mpp_input / 0.314   # target ~0.35 µm/px

# Tiling
# Tile 224×224 with 50% overlap (stride 112 px); zero-pad to multiple of 14
# Stitch semantic map (foreground-priority) and HV map (last-write)

# Watershed instance separation
# Energy = HV magnitude; EDT seeds; compactness=0.01

The model expects 224×224 RGB patches normalised with ImageNet statistics (μ=(0.485, 0.456, 0.406), σ=(0.229, 0.224, 0.225)).

Repository Contents

File	Description
`model.safetensors`	Model weights — UNI2-h backbone + both decoder heads (~3.2 GB)
`trainer_state.json`	Full training log: per-step mIoU, loss, learning rate
`training_args.bin`	HuggingFace Trainer configuration

Related Models in This Series

Model	Stage	Training data	Best val mIoU	Eval basis
This model	1	PanNuke — 7,901 patches (GT)	0.9313	Human GT
TCGA-UT-0	2	TCGA-UT Scale 0 — 271K patches (PL)	0.8197	PL-val†
TCGA-UT-012345	3	TCGA-UT Scales 0–5 — 1.6M patches (PL)	0.7724	PL-val†

†PL-val = pseudo-label validation (self-consistency on model-generated labels, not human GT). Cross-domain evaluation on PanNuke GT for M2/M3 is pending (requires dedicated inference run).

Data Availability

PanNuke dataset: https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke
UNI2-h backbone: https://huggingface.co/MahmoodLab/uni2-h
Pseudo-labelled TCGA-UT dataset: https://huggingface.co/datasets/mizjaggy18/tcga-ut-cell-instance-semantic
Training code: to be released upon paper acceptance

Citation

This model is part of the SegTME-UNI2 framework (manuscript under review, Computers in Biology and Medicine). Please also cite the UNI2 foundation model and PanNuke dataset:

@article{chen2024uni2,
  title={Towards a General-Purpose Foundation Model for Computational Pathology},
  author={Chen, Richard J and others},
  journal={Nature Medicine},
  year={2024}
}
@article{gamper2020pannuke,
  title={PanNuke Dataset Extension, Insights and Baselines},
  author={Gamper, Jevgenij and others},
  year={2020}
}

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

F32

mizjaggy18
/

SegTME-UNI2-UperHoVer_PanNuke