maazshahbaz commited on
Commit
150ec31
·
verified ·
1 Parent(s): 6d81569

Add Mean Validation Dice 0.7161 + validation_summary.json (per-case results from checkpoint_best)

Browse files
Files changed (2) hide show
  1. README.md +19 -12
  2. validation_summary.json +234 -0
README.md CHANGED
@@ -33,9 +33,10 @@ This is a single-fold pretrain checkpoint, intended as a starting point for down
33
  | **Loss** | Dice + Cross-Entropy (nnU-Net default), `batch_dice=True` |
34
  | **Schedule** | 1000 epochs, polynomial LR decay 0.01 → 0, batch size 2, patch `[80, 192, 160]` |
35
  | **Hardware** | 1× NVIDIA H100 80GB, ~6h wall-time |
36
- | **Best EMA Pseudo Dice** | **0.8155** (epoch ~755) |
37
- | **Expected real test Dice** | ~0.82–0.84 via sliding-window inference |
38
- | **Comparison** | At the top of published nnU-Net Task06 baselines (0.69–0.78) |
 
39
 
40
  ## Files in this repo
41
 
@@ -133,17 +134,23 @@ Input images should be CT volumes named with the nnU-Net channel suffix: `<case_
133
 
134
  ## Evaluation
135
 
136
- | Metric | Value |
137
- |--------|-------|
138
- | Best EMA Pseudo Dice (fold 0 validation) | **0.8155** |
139
- | Pseudo Dice raw (jagged) range | 0.50–0.85 |
140
- | Final-epoch train loss | -0.85 |
141
- | Final-epoch val loss | -0.75 |
142
- | Train/val gap | ~0.10 (mild late-stage overfitting; `checkpoint_best` predates this) |
143
 
144
- The training plot (`progress.png`) shows a smooth Pseudo Dice climb from 0 → 0.7 in the first ~50 epochs and a slow refinement to 0.81 by epoch ~750. After that, train loss continues to drop while val loss plateaus — this is the overfitting signature, and nnU-Net's best-checkpoint mechanism preserves the pre-overfit weights.
 
 
 
 
 
 
145
 
146
- Note that **Pseudo Dice is voxel-pooled across validation patches**, not per-case averaged. Real test-time Dice (per-case, full-volume sliding-window inference) typically lands 0.5–3% higher than Pseudo Dice — so the 0.8155 number translates to roughly **0.82–0.84 real test Dice**, which we expect to confirm via `nnUNetv2_predict` on the 13 fold-0 validation cases.
 
 
 
 
 
 
147
 
148
  ## Limitations
149
 
 
33
  | **Loss** | Dice + Cross-Entropy (nnU-Net default), `batch_dice=True` |
34
  | **Schedule** | 1000 epochs, polynomial LR decay 0.01 → 0, batch size 2, patch `[80, 192, 160]` |
35
  | **Hardware** | 1× NVIDIA H100 80GB, ~6h wall-time |
36
+ | **Mean Validation Dice** (per-case, sliding-window) | **0.7161** |
37
+ | **Best EMA Pseudo Dice** (in-training proxy) | 0.8155 (epoch ~755) |
38
+ | **Foreground IoU** (per-case avg) | ~0.59 (from `validation_summary.json`) |
39
+ | **Comparison** | Within published nnU-Net Task06 range (0.69–0.78 across various reports) |
40
 
41
  ## Files in this repo
42
 
 
134
 
135
  ## Evaluation
136
 
137
+ Two complementary Dice metrics, both honest, computed on the 13 fold-0 validation cases:
 
 
 
 
 
 
138
 
139
+ | Metric | Value | What it measures |
140
+ |--------|-------|------------------|
141
+ | **Mean Validation Dice** (per-case, sliding-window) | **0.7161** | Per-case Dice from full-volume `nnUNetv2_predict` inference on each of the 13 val cases, averaged. **Case-weighted** — every scan counts equally regardless of tumor size. *This is the metric most papers report.* |
142
+ | **Best EMA Pseudo Dice** (in-training) | 0.8155 | Voxel-pooled Dice across validation patches during training. **Voxel-weighted** — large tumors dominate. Used by nnU-Net to select `checkpoint_best.pth`. |
143
+ | Pseudo Dice raw (jagged) range | 0.50–0.85 | (peak per-epoch readings during training) |
144
+ | Final-epoch train loss | -0.85 | Mild late-stage overfitting visible in `progress.png`. |
145
+ | Final-epoch val loss | -0.75 | `checkpoint_best.pth` predates this. |
146
 
147
+ The 0.10 gap between Pseudo Dice (0.8155) and Mean Validation Dice (0.7161) is **smaller than for varied-lesion-size datasets** like NLSTseg or Dataset500 (~0.15 gap there). MSD Task06's tumors are uniformly large (median volume 5.22 cm³), so voxel-pooled and per-case Dice are reasonably close. The smaller a dataset's lesions and the wider the size distribution, the bigger the Pseudo–Mean gap.
148
+
149
+ The training plot (`progress.png`) shows a smooth Pseudo Dice climb from 0 → 0.7 in the first ~50 epochs and slow refinement to 0.81 by epoch ~750, then mild overfitting (train loss continues to drop, val loss plateaus). nnU-Net's best-checkpoint mechanism preserves the pre-overfit weights — that's the model in this repo.
150
+
151
+ For comparisons against other methods, **cite the Mean Validation Dice (0.7161)**. Pseudo Dice is useful as an in-training monitoring signal but not for cross-method comparison.
152
+
153
+ Per-case validation results are available in `validation_summary.json` (Dice, IoU, TP/FP/FN counts per case).
154
 
155
  ## Limitations
156
 
validation_summary.json ADDED
@@ -0,0 +1,234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "foreground_mean": {
3
+ "Dice": 0.7161166470256257,
4
+ "FN": 3705.153846153846,
5
+ "FP": 3262.3076923076924,
6
+ "IoU": 0.5904215376842531,
7
+ "TN": 68156469.07692307,
8
+ "TP": 14168.384615384615,
9
+ "n_pred": 17430.69230769231,
10
+ "n_ref": 17873.53846153846
11
+ },
12
+ "mean": {
13
+ "1": {
14
+ "Dice": 0.7161166470256257,
15
+ "FN": 3705.153846153846,
16
+ "FP": 3262.3076923076924,
17
+ "IoU": 0.5904215376842531,
18
+ "TN": 68156469.07692307,
19
+ "TP": 14168.384615384615,
20
+ "n_pred": 17430.69230769231,
21
+ "n_ref": 17873.53846153846
22
+ }
23
+ },
24
+ "metric_per_case": [
25
+ {
26
+ "metrics": {
27
+ "1": {
28
+ "Dice": 0.8758524796398066,
29
+ "FN": 436,
30
+ "FP": 1439,
31
+ "IoU": 0.7791259276711038,
32
+ "TN": 148627159,
33
+ "TP": 6614,
34
+ "n_pred": 8053,
35
+ "n_ref": 7050
36
+ }
37
+ },
38
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_006.nii.gz",
39
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_006.nii.gz"
40
+ },
41
+ {
42
+ "metrics": {
43
+ "1": {
44
+ "Dice": 0.8516823071641108,
45
+ "FN": 1264,
46
+ "FP": 7808,
47
+ "IoU": 0.7416782938010763,
48
+ "TN": 63141585,
49
+ "TP": 26047,
50
+ "n_pred": 33855,
51
+ "n_ref": 27311
52
+ }
53
+ },
54
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_010.nii.gz",
55
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_010.nii.gz"
56
+ },
57
+ {
58
+ "metrics": {
59
+ "1": {
60
+ "Dice": 0.6412669953682952,
61
+ "FN": 19789,
62
+ "FP": 1820,
63
+ "IoU": 0.4719595337585221,
64
+ "TN": 68116517,
65
+ "TP": 19314,
66
+ "n_pred": 21134,
67
+ "n_ref": 39103
68
+ }
69
+ },
70
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_033.nii.gz",
71
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_033.nii.gz"
72
+ },
73
+ {
74
+ "metrics": {
75
+ "1": {
76
+ "Dice": 0.8905521818952126,
77
+ "FN": 1503,
78
+ "FP": 1597,
79
+ "IoU": 0.8026985743380856,
80
+ "TN": 77578912,
81
+ "TP": 12612,
82
+ "n_pred": 14209,
83
+ "n_ref": 14115
84
+ }
85
+ },
86
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_034.nii.gz",
87
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_034.nii.gz"
88
+ },
89
+ {
90
+ "metrics": {
91
+ "1": {
92
+ "Dice": 0.8732567870652429,
93
+ "FN": 10417,
94
+ "FP": 8514,
95
+ "IoU": 0.7750273327946,
96
+ "TN": 62830412,
97
+ "TP": 65217,
98
+ "n_pred": 73731,
99
+ "n_ref": 75634
100
+ }
101
+ },
102
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_041.nii.gz",
103
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_041.nii.gz"
104
+ },
105
+ {
106
+ "metrics": {
107
+ "1": {
108
+ "Dice": 0.23342576254096295,
109
+ "FN": 3563,
110
+ "FP": 2519,
111
+ "IoU": 0.13213470319634704,
112
+ "TN": 32760992,
113
+ "TP": 926,
114
+ "n_pred": 3445,
115
+ "n_ref": 4489
116
+ }
117
+ },
118
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_042.nii.gz",
119
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_042.nii.gz"
120
+ },
121
+ {
122
+ "metrics": {
123
+ "1": {
124
+ "Dice": 0.8932495470141486,
125
+ "FN": 1696,
126
+ "FP": 1073,
127
+ "IoU": 0.8070920997631322,
128
+ "TN": 59230190,
129
+ "TP": 11585,
130
+ "n_pred": 12658,
131
+ "n_ref": 13281
132
+ }
133
+ },
134
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_046.nii.gz",
135
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_046.nii.gz"
136
+ },
137
+ {
138
+ "metrics": {
139
+ "1": {
140
+ "Dice": 0.8605891315388522,
141
+ "FN": 615,
142
+ "FP": 483,
143
+ "IoU": 0.7552930688656118,
144
+ "TN": 84405881,
145
+ "TP": 3389,
146
+ "n_pred": 3872,
147
+ "n_ref": 4004
148
+ }
149
+ },
150
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_048.nii.gz",
151
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_048.nii.gz"
152
+ },
153
+ {
154
+ "metrics": {
155
+ "1": {
156
+ "Dice": 0.60389494371985,
157
+ "FN": 175,
158
+ "FP": 2042,
159
+ "IoU": 0.43255694906577935,
160
+ "TN": 57143485,
161
+ "TP": 1690,
162
+ "n_pred": 3732,
163
+ "n_ref": 1865
164
+ }
165
+ },
166
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_059.nii.gz",
167
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_059.nii.gz"
168
+ },
169
+ {
170
+ "metrics": {
171
+ "1": {
172
+ "Dice": 0.8012234295990041,
173
+ "FN": 5341,
174
+ "FP": 7592,
175
+ "IoU": 0.6683676085953126,
176
+ "TN": 33515434,
177
+ "TP": 26065,
178
+ "n_pred": 33657,
179
+ "n_ref": 31406
180
+ }
181
+ },
182
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_065.nii.gz",
183
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_065.nii.gz"
184
+ },
185
+ {
186
+ "metrics": {
187
+ "1": {
188
+ "Dice": 0.7780967340085576,
189
+ "FN": 862,
190
+ "FP": 3235,
191
+ "IoU": 0.636790780141844,
192
+ "TN": 63165424,
193
+ "TP": 7183,
194
+ "n_pred": 10418,
195
+ "n_ref": 8045
196
+ }
197
+ },
198
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_066.nii.gz",
199
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_066.nii.gz"
200
+ },
201
+ {
202
+ "metrics": {
203
+ "1": {
204
+ "Dice": 0.5206827309236948,
205
+ "FN": 487,
206
+ "FP": 4287,
207
+ "IoU": 0.35197502375458123,
208
+ "TN": 69722937,
209
+ "TP": 2593,
210
+ "n_pred": 6880,
211
+ "n_ref": 3080
212
+ }
213
+ },
214
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_070.nii.gz",
215
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_070.nii.gz"
216
+ },
217
+ {
218
+ "metrics": {
219
+ "1": {
220
+ "Dice": 0.48574338085539714,
221
+ "FN": 2019,
222
+ "FP": 1,
223
+ "IoU": 0.32078009414929387,
224
+ "TN": 65795170,
225
+ "TP": 954,
226
+ "n_pred": 955,
227
+ "n_ref": 2973
228
+ }
229
+ },
230
+ "prediction_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/validation/lung_079.nii.gz",
231
+ "reference_file": "/proj/rasool_lab_projects/Maaz/cln-segmenter/data/msd_task06_nnunet/nnunet_preprocessed/Dataset502_MSDLung/gt_segmentations/lung_079.nii.gz"
232
+ }
233
+ ]
234
+ }