YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
TubuleSegmentation v1.0.0 — Seminiferous tubule segmentation and morphometry (mouse testis, H&E)
Semantic segmentation model for seminiferous tubules in H&E sections of mouse testis (Mus musculus, CF-1 strain). Segments 3 classes (0 = background, 1 = epithelium, 2 = lumen) and derives calibrated morphometric metrics. Architecture: EfficientNet-B4 encoder + UNet with a dual decoder (segmentation + boundary) and SCSE attention. Input 512×512, ImageNet normalization, scale 0.32 µm/px.
This release (v1.0.0) is the result of ~23 internal iterations of architecture and training design.
1. Segmentation performance (validation, n = 49)
| Class | IoU | Dice / F1 |
|---|---|---|
| Background (0) | 0.989 | 0.994 † |
| Epithelium (1) | 0.935 | 0.966 † |
| Lumen (2) | 0.931 | 0.964 † |
| Mean | mIoU 0.952 | Dice 0.975 † |
- Best checkpoint: epoch 47
Metrics reported on the validation set, never on train.
2. Agreement with manual measurement (ImageJ, gold standard) — validation, n = 39
Method comparison (predicted vs. manual freehand tracing). Bias = predicted − manual. 95% LoA = Bland-Altman limits of agreement.
Tubule
| Metric | Pearson r | CCC (Lin) | Bias | Bias % | 95% LoA | MAE |
|---|---|---|---|---|---|---|
| Area (µm²) | 0.996 | 0.78 | +4043.8 | +11.9% | [2379.6, 5708.0] | 4043.8 |
| Major axis (µm) | 0.996 | 0.83 | +14.29 | +6.2% | [8.71, 19.86] | 14.29 |
| Minor axis (µm) | 0.995 | 0.80 | +11.71 | +6.2% | [8.03, 15.38] | 11.71 |
| Max Feret (µm) | 0.997 | 0.85 | +13.50 | +5.8% | [8.15, 18.85] | 13.50 |
| Min Feret (µm) | 0.995 | 0.82 | +11.05 | +5.9% | [7.22, 14.89] | 11.05 |
| Perimeter (µm) | 0.996 | 0.79 | +37.73 | +5.7% | [25.38, 50.07] | 37.73 |
| Aspect ratio | 0.999 | 1.00 | −0.0001 | +0.4% | [−0.013, 0.013] | 0.005 |
| Roundness | 0.997 | 0.99 | −0.007 | +0.9% | [−0.020, 0.007] | 0.007 |
Lumen
| Metric | Pearson r | CCC (Lin) | Bias | Bias % | 95% LoA | MAE |
|---|---|---|---|---|---|---|
| Area (µm²) | 0.993 | 0.95 | +1080.3 | +8.1% | [8.5, 2152.2] | 1080.3 |
| Major axis (µm) | 0.970 | 0.83 | +12.19 | +8.3% | [1.43, 22.96] | 12.19 |
| Minor axis (µm) | 0.988 | 0.88 | +9.86 | +8.7% | [3.59, 16.14] | 9.86 |
| Max Feret (µm) | 0.974 | 0.96 | +4.51 | +3.7% | [−6.51, 15.54] | 6.36 |
| Min Feret (µm) | 0.984 | 0.98 | +2.70 | +3.1% | [−4.88, 10.28] | 4.02 |
| Perimeter (µm) | 0.926 | 0.93 | −4.06 | +6.0% | [−113.88, 105.77] | 41.90 |
| Aspect ratio | 0.972 | 0.97 | −0.009 | +3.0% | [−0.118, 0.101] | 0.040 |
| Roundness | 0.931 | 0.80 | −0.061 | +8.0% | [−0.147, 0.024] | 0.062 |
Shape vs. size: dimensionless shape descriptors (aspect ratio, roundness) reach CCC ≈ 0.99 in tubule and 0.97 in lumen. The model captures morphology almost perfectly; the ~8–12% area bias is an approximately uniform overestimation that preserves shape rather than distorting it. Per-metric Bland-Altman plots in
imagej_validation/.
3. Data and training
| Item | Value |
|---|---|
| Tissue / species | Mouse testis (Mus musculus), CF-1 strain; seminiferous tubules, H&E |
| Dataset | LuGot16/tubules (HuggingFace) |
| Train | 273 images — 11 animals, 2 slides per animal |
| Train / Validation | 273 / 49 images, random 85/15 split (seed 42) |
| Split level | Image-level random split (not grouped by animal) |
| Scale | 0.32 µm/px |
| Input | 512×512, ImageNet normalization |
| Classes | 0 = background, 1 = epithelium, 2 = lumen |
| Augmentation | Macenko (stain normalization) + geometric |
| Post-processing | largest connected component → morphological closing → hole filling → lumen cleanup |
| Inference | 8× TTA (4 rotations × 2 flips) |
4. Robustness, known biases, and scope
Generalization (animals / stain intensities / fixative):
- Note on validation: the train/validation split is random at the image level, not grouped by animal — so images from the same animal may appear in both sets. The validation metrics in Sections 1–2 may therefore be optimistic with respect to generalization to fully unseen animals. The strongest evidence for cross-animal generalization is the external test below (129 images from other animals and experiments, with different stains and fixatives).
- External test: 129 images from other animals and different experiments (the training set comprises 11 animals across 3 experiments; the external images come from outside that set), with widely varying stain intensities and different fixatives. Satisfactory segmentation on 128/129; the single failure was a patch of extremely thin epithelium left unsegmented in one image. Macenko stain normalization during training contributes to this tolerance to stain and fixation variation.
Known biases:
- Systematic area overestimation: ~+12% (tubule) and ~+8% (lumen) vs. manual tracing. It is systematic (r ≈ 0.99), attributable to the boundary convention (the model mask is slightly more generous than the manual freehand trace), not noise. It preserves shape (AR/roundness CCC ≈ 0.99). Do not correct with a factor derived from the validation set itself.
- Lumen perimeter: good mean agreement (CCC 0.93, bias ≈ 0) but wide LoA (±110 µm): varies case by case due to tracing convention.
Validated scope: single-tubule crops at 0.32 µm/px, H&E staining. Not validated on multi-tubule fields or other magnifications.
Attribution
Dataset curated by Lucila Gotfryd (image acquisition, annotation, design of the segmentation and morphometry approach, including the anatomically-motivated containment constraint and the choice to enforce tubular connectivity). Model implementation carried out with AI coding assistance under the author's direction.