ho11laqe's picture
init
ecf08bc
# Multi-Task Learning for Glacier Segmentation and Calving Front Prediction with the nnU-Net.
This project contains the script for the experiments that are described in the paper "Out-of-the-box calving front detection method using deep-learning" by Herrmann et al. https://tc.copernicus.org/preprints/tc-2023-34/
The project was build up on the nnU-Net project by Isensee, F., Jaeger, P. F. (2020) https://github.com/MIC-DKFZ/nnUNet. The folders that are new to the project are marked as "xx_new". I tried to change a minimum number of original files and create new ones, but it was no always feasible.
## Out-of-the-box claving front detection
To apply the trained nnU-Net on a set of SAR images for claving front detection, follow the steps below:
1. Download this repository extract the files https://github.com/ho11laqe/nnUNet_calvingfront_detection.git
2. Download the pretrained model from Zenodo and extract the zip-file https://zenodo.org/record/7837300#.ZD1OI9IzbUA.
3. Install the repository
- Create a new virtual environment with `python3 -m venv /path/to/venv/nnunet` and repace the path with the location,
where the virtual environment should be installed.
- Activate the environment with `source /path/to/venv/nnunet/bin/activate`.
- Install the repository by entering `pip install -e /path/to/extraced/repository/nnunet_clavingfront` and replace the path.
7. Run the calving front prediction with `bash RUN_CALBINGFRONT_DETECTION.sh -d /path/to/SARimages/ -m /path/to/pretrained/model/` and replace the paths
with the path to the folder containing the SAR images and path to the pretrained model.
## 1. Dataset
The dataset is provided by Gourmelon et al. and can be found [here](https://doi.pangaea.de/10.1594/PANGAEA.940950).
It contains 681 SAR images of seven glaciers taken by seven different satellites. Two glaciers are located in the
northern hemisphere and five in the southern hemisphere. The two glaciers on the southern hemisphere are the Columbia
Glacier in Alaska and the Jacobshavn in Greenland. Both glaciers are
famous representatives of their regions because they are two of the largest tidewater glaciers in the world. The
Columbia Glacier has a length of 51 km, and a thickness of 550 m. The glacier has been retreating at a rate of
approximately 0.6 km per year since 1982. Jacobshaven has a length of 65 km, a thickness of
2000 m, and retreated 27.4 km between 1902 and 2010. The five glaciers in the southern
hemisphere are all located at the Arctic Penisula.
<img src="Figures/studysites.png" width="1024px" />
<img src="Figures/dataset_numbers.png" width="1024px" />
Properties of the dataset including list of captured glaciers train-test-split,
number of images per glacier, and covered area in km.
The dataset contains two labels for each glacier image. One is a mask of the different
zones of the glacier (ocean, glacier, radar shadow, rock). The other label contains a 1 pixel
wide line representing the calving front. A sample of each glacier in the training set with
its corresponding labels is shown in Figure 2. Predicting the zone mask can be seen as a
classic segmentation problem. The only specialty is that all pixels are associated with a
specific class so that there is no general ’background’-class for unclassifiable pixels. Because
of the high-class imbalance, the calving front delineation is a more difficult task. Fewer
than 1 % of the pixels are labeled as the front. Additionally, the structure of the class
region is not a convex hull but a thin line.
<img src="Figures/dataset.png" width="1024px" />
Figure 2: Sample images of every glacier in the train set and their corresponding labels.
The first row shows the front label with black background and a 1 pixel wide white line
representing the calving front. The second row contains the zone labels with four classes:
ocean (white), glacier (light gray), rock (dark gray), radar shadow (black).
Every glacier is captured by multiple satellites for a higher temporal resolution. Mean-
ing, that recordings of one glacier are captured by different SAR systems with different
resolutions. In Figure 3 a timeline of the images of each glacier visualizes the observation
time and frequency of the images. The first two rows show the glacier of the test set.
<img src="Figures/satellites.png" width="1024px" />
Figure 3: Distribution of the dataset’s images over time. The samples are grouped by the
seven glaciers, and colored according to the capturing satellite.
## 2. nnU-Net
The nnU-Net by Fabian Isensee et al. [Ise+21] reduces the hyperparameter
search by taking a fingerprint of the dataset and adjusting hyperparameters accordingly.
Additionally, there are fixed parameters. These parameters are based on the authors’
experience and generalize well across a variety of tasks. The structure of the nnU-Net is
visualized in Figure 4.
<img src="Figures/nnUnet.png" width="1024px" />
Figure 4: Illustration of the nnU-Net framework created by Isensee et al. [Ise+21]
I retraced the pipline of the nnU-Net and created the following visualizations. Figure 5 show the whole pipeline
including the added python scripts. The data samples and the labels have to be in the Neuroimaging Informatics Technology
Initiative (NIfTI) file format, separated into test and training samples. The NIfTI file format
was developed for neuroimaging. The files store 3D scans of brains or other organs. The
format stores additional information about the orientation of the data, distances between
the individual pixels/voxels, and layers. Because the nnU-Net was developed for medical
imaging, it uses this file format.
<img src="Figures/Pipeline.png" width="1024px" />
Figure 5: Scripts for conversion between PNG and NIfTI (blue), nnU-Net scripts (purple),
evaluation scripts (green).
### 2.1 Preprocessing
The nnU-Net crops borders before the
dataset fingerprint is created. While the dataset is perused for black borders, properties of
every image, including size and spacing, are stored. After the cropping, every sample and
its corresponding label is stacked into one NIfTI file. Finally, the dataset’s fingerprint is
created by analyzing the dataset. The fingerprint includes the size and spacing of every
sample, the number of classes, the imaging modality, and the intensity distribution.
Based on the dataset’s fingerprint and the available Random Access Memory (RAM)
of the Graphics Processing Unit (GPU), a plan for the training and architecture of the
U-Net is created. The hyperparameters concerning the architecture is the patch size and
the number of layers. Most often, using the whole data sample as the input for the U-Net
results in a massive number of parameters and will not fit on traditional GPUs. Therefore,
the image is divided into smaller parts called patches. Their segmentation mask is stitched
together afterwards to get a segmentation of the whole image. The patch size is initialized
with the median image shape and iteratively reduced until at least two images can be
processed in parallel. The number of images passed through the network in parallel is called
batch size and provides a more stable training. Here a larger patch size is preferred over a
larger batch size to provide more contextual information for the segmentation. The patch
size also represents the size of the first and last layer of the U-Net.
<img src="Figures/preprocessing.png" width="1024px" />
Figure 6: Plan and preprocessing pipeline including generated files. The python scripts
are reduced to the important functions.
### 2.2. Training
Before the training starts, the network trainer and the network have to be initialized with
the parameters generated by the previous step. The trainer’s parameters are learning rate,
loss function, the maximum number of epochs, optimizer, and dataloader. The dataloader
is responsible for creating the patches, batches, and augmentation of the samples. There
are 11 augmentation steps in the nnU-Net listed in the Table below.
<img src="Figures/augmentation.png" width="1024px" />
In the next step, the network is created based on the generated parameters. The U-Net
consists of multiple blocks (in this work: nine encoder blocks and eight decoder blocks).
The encoder block and the decoder block are illustrated in Figure 7. The encoder block
contains two convolutional layers. Each block is followed by an instance normalization and
the activation function (leaky rectified linear unit). For instance normalization, the mean
and variance are calculated for every feature map. Afterwards, it is subtracted by the mean
and divided by the variance. The decoder block takes as input the output of the previous
block and the output of the corresponding encoder block. The output of the previous block
is scaled up with a transpose convolution and then concatenated with the encoder output.
Then the decoder block is equally structured with the encoder block. The output is used
by the next layer and the deep supervision part.
<img src="Figures/unetblocks.png" width="1024px" />
Figure 7: Illustration of the encoder and decoder blocks that make up the architecture of
the nnU-Net. The encoder and the decoder contain multiple blocks.
After everything is initialized, the network is trained to minimize the loss function.
The loss function of the nnU-Net is the summation of the cross entropy loss. Typically one epoch
corresponds to feeding every dataset sample to the network. The nnU-Net sets a fixed
number of iterations (250) to be one epoch. Because the network ensures that at least
one-third of samples contain a randomly chosen foreground class, this is especially helpful
for the class imbalance of the front label, where most of the patches do not contain any
class.
<img src="Figures/training2.svg" width="1024px" />
### 2.3 Post-processing
The trained model can be used to detect the target in unseen data. First, the model files
are loaded from the specified fold. Afterwards, the network preprocesses the hold-out test
set. The test samples are divided into patches similar to the training samples. For a robust
result, the patches are rotated three times, and the three resulting predictions are then
combined by averaging the pixel values. The network accuracy decreased towards the
borders of patches, therefore the predictions are weighted by a Gaussian bell. Finally, the
patches overlap by half of the patch size to get a smoother final result and stored as NIfTI
files in the specified folder. The inference script and its steps are illustrated in Figure 3.11.
<img src="Figures/postprocessing.png" width="1024px" />
## 3. Adjustments of the nnU-Net Pipeline
There are mainly two approaches that can
be distinguished. The approach that requires a minimum change to the vanilla U-Net is
created by adding the second label as a second channel to the last layer (late-branching).
Only the parameters for an additional kernel in the last layer need to be trained additionally.
The total number of parameters that need to be trained changes insignificantly. The second
approach uses one decoder for every label (early-branching).
<img src="Figures/early_late_branching.png" width="1024px" />
The PNGs of glaciers had to be converted to the NIfTI file format. Because the glacier
labels are 2D, the two labels were stacked in one label file with the label of the front located
at z = 0 and the zone masks at z = 1. In the dataset.json, which contains the dataset’s
metadata, the label entry contains a list of multiple labels with multiple classes instead of a
single list of classes. After the dataset is in the desired directory and format, the nnU-Net
scripts can be executed.
Changes in the preprocessing concern mainly the added dimension of the labels in
dataset.json. Meaning, there are now multiple labels, each with multiple classes. And
not one label with multiple classes. During the planning of the experiment, the es-
timated size of the network architecture is requested. This work implements a new
class Generic_UNet_MTLearly, which returns the network’s size in a method (com-
pute_approx_vram_consumption). For comparison, this value is also used for the late-
branching network, even if its size is small. Otherwise, early and late branching networks
would be trained on different patch sizes. Generic_UNet_MTLearly is derived from the
given class Generic_UNet, which was included in the framework and is used in this work
for the single task segmentation. The Generic_UNet_MTLearly contains a second decoder, which is created in the initialization of every instance of the class and used in the
forward-function. The outputs of both decoders is concatenated before returned.
Another class is responsible for the training of the network. The given nnUNetTrainerV2
was used for the single task segmentation. For the MTL a new nnUNetTrainerMTLearly
and nnUNetTrainerMTLlate were derived from the single task trainer. These trainer
classes contain hyperparameters, e.g., a maximum number of epochs and deep supervision
scales. They also trigger the initialization the network, run feedforward, compute the loss
and trigger the update of the weights. The initialization of the network is done in the
aforementioned Generic_UNet classes. For early-branching, the last layer and the layer
for deep supervision are modified to create two channel outputs. For lat e-branching, the
decoder is duplicated, and the results of the decoders are concatenated before the return of
the feedforward. After every iteration, the error of both labels is calculated as described in
Section 3.6.2 and summed up with an equal weighting (unitary scalarization).
Only minor changes had to be made in the inference script (predict_simple.py). After
the test samples are divided into patches and fed through the network, multiple channels
of the network’s output had to be separated and the patch predictions are composed to
the prediction of the whole image. A list of the edited and added scripts is provided in the following table.
<img src="Figures/listoffiles.png" width="700" />
## 4. Experiments
The first research goal is to apply the out-of-the-box nnU-Net, how it is intended to
be used, on the glacier front detection and on the glacier zone segmentation, which is
represented by the first two columns in the Figure 5.1. 5-fold cross-validation is used to
eliminate the bias of the weight initialization, and the bias of the data split into training
and validation sets. Every column in Figure 5.1 represents five newly trained U-Nets,
with common hyperparameters but different weight initialization. The evaluations of the
individual models are averaged to get a robust measure independent of weight initialization
and data split.
<img src="Figures/experiments.png" width="1024" />
raining the nnU-Net directly on the front labels is the most straightforward approach to
get a method for calving front detection. The label of the calving front is dilated to the width
of five pixels. In provisional experiments the dilation has shown to make the predictions
more robust. For the training with zone labels, the evaluation script includes extraction of
the boundary between ocean and glacier, which is described in more detail in Section 5.2.
The following approach is to train the U-Net on the zone and front label simultaneously.
Two architectures are compared. The early-branching and the late-branching network are
described in Section 4.1. The fifth experiment of this work extracts the boundaries of the
glacier zone to all other zones as a third segmentation task for the U-Net (see Figure 5.2).
In contrast to the calving front segmentation, which is a particular part of the boundary
between glacier and ocean. The label of the glacier boundaries was extracted from the zone
label. All glacier pixels with a neighbouring rock or shadow pixel are assigned as glacier
boundary. The hypothesis is that providing more information about the same domain
benefits the performance of the U-Net on the individual task. The last experiment fuses
the two labels by creating a fourth class in the zone label associated with the glacier front.
Because the front line has a width of five pixels, the other zone classes are merely impaired.
After the samples are converted and stored in the correct directory, the first nnU-Net
script reprocesses the data, takes a fingerprint of the dataset, and generates a corresponding
plan for the training. The training plan contains the number of layers, kernel size for every
convolutional layer, patch, and batch size. For the glacier dataset and a NVIDIA RTX 3080
with 12GB memory, the resulting network architecture has nine encoder blocks and eight
decoder blocks (see Figure 3.8). Considering that every block has two convolutional layers,
the network architecture is relatively deep compared to the U-net presented in [Ron+15].
Deep networks usually suffer from vanishing gradients. Vanishing gradients is avoided
in this U-Net with 34 convolutional layers using deep supervisions. Deep supervision is
explained in more detail in Section 3.6.2. The kernels of all convolutional layers have the
size of 3x3. During training, one batch contains two images. Each image has a patch size
of 1024 x 896 pixels. The second nnU-Net script trains the network and stores the trained
models. The U-Net is trained with an SGD optimizer with an initial learning rate of 0.01,
a Nesterov momentum of 0.99, and a weight decay of 3e-5. Training of one epoch took
between 100 s and 160 s. The nnU-Net uses early-stopping, but due to limited resources
the maximum number of epochs (500) is reached in every training. The common way to
define one epoch to iterate over every sample of the training set. The nnU-Net uses a fixed
number of iterations (250) to define one epoch. In iteration the batch is sampled depending
on the class distribution of the sample to counteract class-imbalance. After the training,
the final model is used to predict the test set. The predictions of the test set are stored in
NIfTI files. After the test predictions are converted back to PNG, the results are evaluated.
The first nnU-Net script reprocesses the data, takes a fingerprint of the dataset, and generates a corresponding
plan for the training. The training plan contains the number of layers, kernel size for every
convolutional layer, patch, and batch size. For the glacier dataset and a NVIDIA RTX 3080
with 12GB memory, the resulting network architecture has nine encoder blocks and eight
decoder blocks. Considering that every block has two convolutional layers,
the network architecture is relatively deep compared to the U-net presented in [Ron+15].
Deep networks usually suffer from vanishing gradients. Vanishing gradients is avoided
in this U-Net with 34 convolutional layers using deep supervisions. Deep supervision is
explained in more detail in Section 3.6.2. The kernels of all convolutional layers have the
size of 3x3. During training, one batch contains two images. Each image has a patch size
of 1024 x 896 pixels. The second nnU-Net script trains the network and stores the trained
models. The U-Net is trained with an SGD optimizer with an initial learning rate of 0.01,
a Nesterov momentum of 0.99, and a weight decay of 3e-5. Training of one epoch took
between 100 s and 160 s. The nnU-Net uses early-stopping, but due to limited resources
the maximum number of epochs (500) is reached in every training. The common way to
define one epoch to iterate over every sample of the training set. The nnU-Net uses a fixed
number of iterations (250) to define one epoch. In iteration the batch is sampled depending
on the class distribution of the sample to counteract class-imbalance. After the training,
the final model is used to predict the test set. The predictions of the test set are stored in
NIfTI files. After the test predictions are converted back to PNG, the results are evaluated.
A visualization of the training progress of third experiment with a late-branching
architecture is shown in the gif below The gif shows a random sample of the training set.
The predictions of the nnU-Net after different numbers of epochs are superimposed on the
input image. In epoch 0 The classes
are randomly assigned to the pixels. This leads to a noisy pattern of where all classes are
equally distributed.The third and last nnU-Net script executes the inference. After a few epochs
the class distributions of the prediction
is already close to the target distribution. A small number pixels is classified as the glacier
front and large number of pixels classified as the glacier. The ocean classifications are large
clusters but some of them are falsely located in the glacier zone. In the end the calving front
and the ocean is classified correctly
only some parts of the glacier are classified as rock and vice versa. Visually, the predictions
are similar to the target
<img src="create_plots_new/output/overlay.gif" width="412" /> <img src="create_plots_new/output/target (copy).png" width="412" />
The evaluation metric measures how accurate, precise, and robust the method detects the
position of the calving front. Additionally, the precision of the glacier zone segmentation is
meaningful information. The mean distance between the front pixels of the label and the
front pixels of the prediction is used to evaluate the calving front detection. For every pixel
in the label front Y , the distance to the closest pixel in the predicted front X is determined.
Additionally, the distance to its closest pixel in the predicted front is determined for every
pixel in the label front. Both distances are averaged and taken as the mean distance between
the two lines.
<img src="Figures/evaluation.png" width="700" />
## 5. Results
The evaluation metrics described above, show that both tasks achieve higher
accuracy with MTL compared to Single-Task Learning (STL). In Figure 6.1 the front
delineation error of every experiment is compared. The STL approach that is trained on
front label has a front delineation error of 1103 ± 72 m and the STL approach that is trained
on the zone label has a front delineation error of 1184 ± 225 m. The difference between the
STL experiments is that the variance of the performance of the trained model is higher
when trained on the zone labels.
<img src="Figures/fronterror.png" width="1024" />
<img src="Figures/season.png" width="1024" />
The distribution of the test set prediction is plotted in Figure 6.4. In the first row, all
122 test samples are drawn as dots. The median is the middle line in the orange rectangle,
and the dashed line represents the mean. The x-axis has a logarithmic scale. Otherwise, the
outliers would dominate the plot. The rectangle reaches from the first quartile to the third
quartile. Each quartile contains 25 % of the data points. The rows below represent the
samples captured during different seasons. The test set contains two glaciers: Mapple and
COL. The glaciers are located on different hemispheres, therefore the winter and summer
months are different for each glacier. Winter in the northern hemisphere is from October
to March, and winter in the southern hemisphere is from April to August. The mean of the
front prediction of the samples captured during summer have higher precision 458 ± 1060 m
than the samples captured during the winter months 996 ± 1683 m. However, the medians
are more similar with 133 m in the summer month and 185 m in the winter month.
<img src="Figures/glacier.png" width="1024" />
In this Figure the distribution of the prediction is divided into the two glaciers. The
front delineation error for the calving front of Mapple is, on average 127 ± 107 m while
the mean error of COL is 1184 ± 1761 m. This is caused by a group of predictions with an
error > 2000 m. The median value is 275 m for COL and 97 m for Mapple.
<img src="Figures/satellite.png" width="1024" />
In this Figure the front delineation error is grouped by satellite. The predictions of
samples created by ERS, ENVISAT, PALSAR, and TDX have an similar average error
between 150 m and 300 m. The prediction of samples created by TSX are more precise
with 68 ± 59 m and the error on samples created by S1 are less precise with 2060 ± 2139 m.
Most test samples are captured by TSX, TDX and S1. TSX and TDX have a resolution of
6 − 7 m, while S1 has a resolution of 20 m.
<img src="Figures/results.png" width="1024" />
Calving front prediction of COL on 3.9.2011, 22.6.2014, and 11.2.2016 taken by
TDX with 7 m2/pixel resolution; label (blue), prediction (yellow), overlap (magenta).
<img src="Figures/results_mapple.png" width="1024" />
(a) Glacier images taken by ERS (20 m2/pixel)
on 5.2.2007, 20.3.2010, and 8.9.2017.
(b) Glacier images taken by TSX (7 m2/pixel)
on 4.11.2008, 2.11.2009, and 2.8.2013.
Figure 6.9: Calving front prediction of Mapple Glacier; label (blue), prediction (yellow),
overlap (magenta), bounding box (cyan).
All plots are generated by the files in the directory create_plots_new or by hand.
# vvv Readme of the original git project vvv
**[2020_10_21] Update:** We now have documentation for [common questions](documentation/common_questions.md) and
[common issues](documentation/common_problems_and_solutions.md). We now also provide [reference epoch times for
several datasets and tips on how to identify bottlenecks](documentation/expected_epoch_times.md).
Please read these documents before opening a new issue!
# nnU-Net
In 3D biomedical image segmentation, dataset properties like imaging modality, image sizes, voxel spacings, class
ratios etc vary drastically.
For example, images in
the [Liver and Liver Tumor Segmentation Challenge dataset](https://competitions.codalab.org/competitions/17094)
are computed tomography (CT) scans, about 512x512x512 voxels large, have isotropic voxel spacings and their
intensity values are quantitative (Hounsfield Units).
The [Automated Cardiac Diagnosis Challenge dataset](https://acdc.creatis.insa-lyon.fr/) on the other hand shows cardiac
structures in cine MRI with a typical image shape of 10x320x320 voxels, highly anisotropic voxel spacings and
qualitative intensity values. In addition, the ACDC dataset suffers from slice misalignments and a heterogeneity of
out-of-plane spacings which can cause severe interpolation artifacts if not handled properly.
In current research practice, segmentation pipelines are designed manually and with one specific dataset in mind.
Hereby, many pipeline settings depend directly or indirectly on the properties of the dataset
and display a complex co-dependence: image size, for example, affects the patch size, which in
turn affects the required receptive field of the network, a factor that itself influences several other
hyperparameters in the pipeline. As a result, pipelines that were developed on one (type of) dataset are inherently
incomaptible with other datasets in the domain.
**nnU-Net is the first segmentation method that is designed to deal with the dataset diversity found in the domain. It
condenses and automates the keys decisions for designing a successful segmentation pipeline for any given dataset.**
nnU-Net makes the following contributions to the field:
1. **Standardized baseline:** nnU-Net is the first standardized deep learning benchmark in biomedical segmentation.
Without manual effort, researchers can compare their algorithms against nnU-Net on an arbitrary number of datasets
to provide meaningful evidence for proposed improvements.
2. **Out-of-the-box segmentation method:** nnU-Net is the first plug-and-play tool for state-of-the-art biomedical
segmentation. Inexperienced users can use nnU-Net out of the box for their custom 3D segmentation problem without
need for manual intervention.
3. **Framework:** nnU-Net is a framework for fast and effective development of segmentation methods. Due to its modular
structure, new architectures and methods can easily be integrated into nnU-Net. Researchers can then benefit from its
generic nature to roll out and evaluate their modifications on an arbitrary number of datasets in a
standardized environment.
For more information about nnU-Net, please read the following paper:
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2020). nnU-Net: a self-configuring method
for deep learning-based biomedical image segmentation. Nature Methods, 1-9.
Please also cite this paper if you are using nnU-Net for your research!
# Table of Contents
- [Installation](#installation)
- [Usage](#usage)
* [How to run nnU-Net on a new dataset](#how-to-run-nnu-net-on-a-new-dataset)
+ [Dataset conversion](#dataset-conversion)
+ [Experiment planning and preprocessing](#experiment-planning-and-preprocessing)
+ [Model training](#model-training)
- [2D U-Net](#2d-u-net)
- [3D full resolution U-Net](#3d-full-resolution-u-net)
- [3D U-Net cascade](#3d-u-net-cascade)
* [3D low resolution U-Net](#3d-low-resolution-u-net)
* [3D full resolution U-Net](#3d-full-resolution-u-net-1)
- [Multi GPU training](#multi-gpu-training)
+ [Identifying the best U-Net configuration](#identifying-the-best-u-net-configuration)
+ [Run inference](#run-inference)
* [How to run inference with pretrained models](#how-to-run-inference-with-pretrained-models)
* [Examples](#examples)
- [Extending/Changing nnU-Net](#extending-or-changing-nnu-net)
- [Information on run time and potential performance bottlenecks.](#information-on-run-time-and-potential-performance-bottlenecks)
- [Common questions and issues](#common-questions-and-issues)
# Installation
nnU-Net has been tested on Linux (Ubuntu 16, 18 and 20; centOS, RHEL). We do not provide support for other operating
systems.
nnU-Net requires a GPU! For inference, the GPU should have 4 GB of VRAM. For training nnU-Net models the GPU should have
at
least 10 GB (popular non-datacenter options are the RTX 2080ti, RTX 3080 or RTX 3090). Due to the use of automated mixed
precision, fastest training times are achieved with the Volta architecture (Titan V, V100 GPUs) when installing pytorch
the easy way. Since pytorch comes with cuDNN 7.6.5 and tensor core acceleration on Turing GPUs is not supported for 3D
convolutions in this version, you will not get the best training speeds on Turing GPUs. You can remedy that by compiling
pytorch from source
(see [here](https://github.com/pytorch/pytorch#from-source)) using cuDNN 8.0.2 or newer. This will unlock Turing GPUs
(RTX 2080ti, RTX 6000) for automated mixed precision training with 3D convolutions and make the training blistering
fast as well. Note that future versions of pytorch may include cuDNN 8.0.2 or newer by default and
compiling from source will not be necessary.
We don't know the speed of Ampere GPUs with vanilla vs self-compiled pytorch yet - this section will be updated as
soon as we know.
For training, we recommend a strong CPU to go along with the GPU. At least 6 CPU cores (12 threads) are recommended. CPU
requirements are mostly related to data augmentation and scale with the number of input channels. They are thus higher
for datasets like BraTS which use 4 image modalities and lower for datasets like LiTS which only uses CT images.
We very strongly recommend you install nnU-Net in a virtual environment.
[Here is a quick how-to for Ubuntu.](https://linoxide.com/linux-how-to/setup-python-virtual-environment-ubuntu/)
If you choose to compile pytorch from source, you will need to use conda instead of pip. In that case, please set the
environment variable OMP_NUM_THREADS=1 (preferably in your bashrc using `export OMP_NUM_THREADS=1`). This is important!
Python 2 is deprecated and not supported. Please make sure you are using Python 3.
1) Install [PyTorch](https://pytorch.org/get-started/locally/). You need at least version 1.6
2) Install nnU-Net depending on your use case:
1) For use as **standardized baseline**, **out-of-the-box segmentation algorithm** or for running **inference with
pretrained models**:
```pip install nnunet```
2) For use as integrative **framework** (this will create a copy of the nnU-Net code on your computer so that you
can modify it as needed):
```bash
git clone https://github.com/MIC-DKFZ/nnUNet.git
cd nnUNet
pip install -e .
```
3) nnU-Net needs to know where you intend to save raw data, preprocessed data and trained models. For this you need to
set a few of environment variables. Please follow the instructions [here](documentation/setting_up_paths.md).
4) (OPTIONAL) Install [hiddenlayer](https://github.com/waleedka/hiddenlayer). hiddenlayer enables nnU-net to generate
plots of the network topologies it generates (see [Model training](#model-training)). To install hiddenlayer,
run the following commands:
```bash
pip install --upgrade git+https://github.com/FabianIsensee/hiddenlayer.git@more_plotted_details#egg=hiddenlayer
```
Installing nnU-Net will add several new commands to your terminal. These commands are used to run the entire nnU-Net
pipeline. You can execute them from any location on your system. All nnU-Net commands have the prefix `nnUNet_` for
easy identification.
Note that these commands simply execute python scripts. If you installed nnU-Net in a virtual environment, this
environment must be activated when executing the commands.
All nnU-Net commands have a `-h` option which gives information on how to use them.
A typical installation of nnU-Net can be completed in less than 5 minutes. If pytorch needs to be compiled from source
(which is what we currently recommend when using Turing GPUs), this can extend to more than an hour.
# Usage
To familiarize yourself with nnU-Net we recommend you have a look at the [Examples](#Examples) before you start with
your own dataset.
## How to run nnU-Net on a new dataset
Given some dataset, nnU-Net fully automatically configures an entire segmentation pipeline that matches its properties.
nnU-Net covers the entire pipeline, from preprocessing to model configuration, model training, postprocessing
all the way to ensembling. After running nnU-Net, the trained model(s) can be applied to the test cases for inference.
### Dataset conversion
nnU-Net expects datasets in a structured format. This format closely (but not entirely) follows the data structure of
the [Medical Segmentation Decthlon](http://medicaldecathlon.com/). Please read
[this](documentation/dataset_conversion.md) for information on how to convert datasets to be compatible with nnU-Net.
### Experiment planning and preprocessing
As a first step, nnU-Net extracts a dataset fingerprint (a set of dataset-specific properties such as
image sizes, voxel spacings, intensity information etc). This information is used to create three U-Net configurations:
a 2D U-Net, a 3D U-Net that operated on full resolution images as well as a 3D U-Net cascade where the first U-Net
creates a coarse segmentation map in downsampled images which is then refined by the second U-Net.
Provided that the requested raw dataset is located in the correct
folder (`nnUNet_raw_data_base/nnUNet_raw_data/TaskXXX_MYTASK`,
also see [here](documentation/dataset_conversion.md)), you can run this step with the following command:
```bash
nnUNet_plan_and_preprocess -t XXX --verify_dataset_integrity
```
`XXX` is the integer identifier associated with your Task name `TaskXXX_MYTASK`. You can pass several task IDs at once.
Running `nnUNet_plan_and_preprocess` will populate your folder with preprocessed data. You will find the output in
nnUNet_preprocessed/TaskXXX_MYTASK. `nnUNet_plan_and_preprocess` creates subfolders with preprocessed data for the 2D
U-Net as well as all applicable 3D U-Nets. It will also create 'plans' files (with the ending.pkl) for the 2D and
3D configurations. These files contain the generated segmentation pipeline configuration and will be read by the
nnUNetTrainer (see below). Note that the preprocessed data folder only contains the training cases.
The test images are not preprocessed (they are not looked at at all!). Their preprocessing happens on the fly during
inference.
`--verify_dataset_integrity` should be run at least for the first time the command is run on a given dataset. This will
execute some
checks on the dataset to ensure that it is compatible with nnU-Net. If this check has passed once, it can be
omitted in future runs. If you adhere to the dataset conversion guide (see above) then this should pass without issues :
-)
Note that `nnUNet_plan_and_preprocess` accepts several additional input arguments. Running `-h` will list all of them
along with a description. If you run out of RAM during preprocessing, you may want to adapt the number of processes
used with the `-tl` and `-tf` options.
After `nnUNet_plan_and_preprocess` is completed, the U-Net configurations have been created and a preprocessed copy
of the data will be located at nnUNet_preprocessed/TaskXXX_MYTASK.
Extraction of the dataset fingerprint can take from a couple of seconds to several minutes depending on the properties
of the segmentation task. Pipeline configuration given the extracted finger print is nearly instantaneous (couple
of seconds). Preprocessing depends on image size and how powerful the CPU is. It can take between seconds and several
tens of minutes.
### Model training
nnU-Net trains all U-Net configurations in a 5-fold cross-validation. This enables nnU-Net to determine the
postprocessing and ensembling (see next step) on the training dataset. Per default, all U-Net configurations need to
be run on a given dataset. There are, however situations in which only some configurations (and maybe even without
running the cross-validation) are desired. See [FAQ](documentation/common_questions.md) for more information.
Note that not all U-Net configurations are created for all datasets. In datasets with small image sizes, the U-Net
cascade is omitted because the patch size of the full resolution U-Net already covers a large part of the input images.
Training models is done with the `nnUNet_train` command. The general structure of the command is:
```bash
nnUNet_train CONFIGURATION TRAINER_CLASS_NAME TASK_NAME_OR_ID FOLD --npz (additional options)
```
CONFIGURATION is a string that identifies the requested U-Net configuration. TRAINER_CLASS_NAME is the name of the
model trainer. If you implement custom trainers (nnU-Net as a framework) you can specify your custom trainer here.
TASK_NAME_OR_ID specifies what dataset should be trained on and FOLD specifies which fold of the 5-fold-cross-validaton
is trained.
nnU-Net stores a checkpoint every 50 epochs. If you need to continue a previous training, just add a `-c` to the
training command.
IMPORTANT: `--npz` makes the models save the softmax outputs during the final validation. It should only be used for
trainings
where you plan to run `nnUNet_find_best_configuration` afterwards
(this is nnU-Nets automated selection of the best performing (ensemble of) configuration(s), see below). If you are
developing new
trainer classes you may not need the softmax predictions and should therefore omit the `--npz` flag. Exported softmax
predictions are very large and therefore can take up a lot of disk space.
If you ran initially without the `--npz` flag but now require the softmax predictions, simply run
```bash
nnUNet_train CONFIGURATION TRAINER_CLASS_NAME TASK_NAME_OR_ID FOLD -val --npz
```
to generate them. This will only rerun the validation, not the training.
See `nnUNet_train -h` for additional options.
#### 2D U-Net
For FOLD in [0, 1, 2, 3, 4], run:
```bash
nnUNet_train 2d nnUNetTrainerV2 TaskXXX_MYTASK FOLD --npz
```
#### 3D full resolution U-Net
For FOLD in [0, 1, 2, 3, 4], run:
```bash
nnUNet_train 3d_fullres nnUNetTrainerV2 TaskXXX_MYTASK FOLD --npz
```
#### 3D U-Net cascade
##### 3D low resolution U-Net
For FOLD in [0, 1, 2, 3, 4], run:
```bash
nnUNet_train 3d_lowres nnUNetTrainerV2 TaskXXX_MYTASK FOLD --npz
```
##### 3D full resolution U-Net
For FOLD in [0, 1, 2, 3, 4], run:
```bash
nnUNet_train 3d_cascade_fullres nnUNetTrainerV2CascadeFullRes TaskXXX_MYTASK FOLD --npz
```
Note that the 3D full resolution U-Net of the cascade requires the five folds of the low resolution U-Net to be
completed beforehand!
The trained models will we written to the RESULTS_FOLDER/nnUNet folder. Each training obtains an automatically generated
output folder name:
nnUNet_preprocessed/CONFIGURATION/TaskXXX_MYTASKNAME/TRAINER_CLASS_NAME__PLANS_FILE_NAME/FOLD
For Task002_Heart (from the MSD), for example, this looks like this:
RESULTS_FOLDER/nnUNet/
├── 2d
│   └── Task02_Heart
│   └── nnUNetTrainerV2__nnUNetPlansv2.1
│   ├── fold_0
│   ├── fold_1
│   ├── fold_2
│   ├── fold_3
│   └── fold_4
├── 3d_cascade_fullres
├── 3d_fullres
│   └── Task02_Heart
│   └── nnUNetTrainerV2__nnUNetPlansv2.1
│   ├── fold_0
│   │   ├── debug.json
│   │   ├── model_best.model
│   │   ├── model_best.model.pkl
│   │   ├── model_final_checkpoint.model
│   │   ├── model_final_checkpoint.model.pkl
│   │   ├── network_architecture.pdf
│   │   ├── progress.png
│   │   └── validation_raw
│   │   ├── la_007.nii.gz
│   │   ├── la_007.pkl
│   │   ├── la_016.nii.gz
│   │   ├── la_016.pkl
│   │   ├── la_021.nii.gz
│   │   ├── la_021.pkl
│   │   ├── la_024.nii.gz
│   │   ├── la_024.pkl
│   │   ├── summary.json
│   │   └── validation_args.json
│   ├── fold_1
│   ├── fold_2
│   ├── fold_3
│   └── fold_4
└── 3d_lowres
Note that 3d_lowres and 3d_cascade_fullres are not populated because this dataset did not trigger the cascade. In each
model training output folder (each of the fold_x folder, 10 in total here), the following files will be created (only
shown for one folder above for brevity):
- debug.json: Contains a summary of blueprint and inferred parameters used for training this model. Not easy to read,
but very useful for debugging ;-)
- model_best.model / model_best.model.pkl: checkpoint files of the best model identified during training. Not used right
now.
- model_final_checkpoint.model / model_final_checkpoint.model.pkl: checkpoint files of the final model (after training
has ended). This is what is used for both validation and inference.
- network_architecture.pdf (only if hiddenlayer is installed!): a pdf document with a figure of the network architecture
in it.
- progress.png: A plot of the training (blue) and validation (red) loss during training. Also shows an approximation of
the evlauation metric (green). This approximation is the average Dice score of the foreground classes. It should,
however, only to be taken with a grain of salt because it is computed on randomly drawn patches from the validation
data at the end of each epoch, and the aggregation of TP, FP and FN for the Dice computation treats the patches as if
they all originate from the same volume ('global Dice'; we do not compute a Dice for each validation case and then
average over all cases but pretend that there is only one validation case from which we sample patches). The reason
for
this is that the 'global Dice' is easy to compute during training and is still quite useful to evaluate whether a
model
is training at all or not. A proper validation is run at the end of the training.
- validation_raw: in this folder are the predicted validation cases after the training has finished. The summary.json
contains the validation metrics (a mean over all cases is provided at the end of the file).
During training it is often useful to watch the progress. We therefore recommend that you have a look at the generated
progress.png when running the first training. It will be updated after each epoch.
Training times largely depend on the GPU. The smallest GPU we recommend for training is the Nvidia RTX 2080ti. With
this GPU (and pytorch compiled with cuDNN 8.0.2), all network trainings take less than 2 days.
#### Multi GPU training
**Multi GPU training is experimental and NOT RECOMMENDED!**
nnU-Net supports two different multi-GPU implementation: DataParallel (DP) and Distributed Data Parallel (DDP)
(but currently only on one host!). DDP is faster than DP and should be preferred if possible. However, if you did not
install nnunet as a framework (meaning you used the `pip install nnunet` variant), DDP is not available. It requires a
different way of calling the correct python script (see below) which we cannot support from our terminal commands.
Distributed training currently only works for the basic trainers (2D, 3D full resolution and 3D low resolution) and not
for the second, high resolution U-Net of the cascade. The reason for this is that distributed training requires some
changes to the network and loss function, requiring a new nnUNet trainer class. This is, as of now, simply not
implemented for the cascade, but may be added in the future.
To run distributed training (DP), use the following command:
```bash
CUDA_VISIBLE_DEVICES=0,1,2... nnUNet_train_DP CONFIGURATION nnUNetTrainerV2_DP TASK_NAME_OR_ID FOLD -gpus GPUS --dbs
```
Note that nnUNetTrainerV2 was replaced with nnUNetTrainerV2_DP. Just like before, CONFIGURATION can be 2d, 3d_lowres or
3d_fullres. TASK_NAME_OR_ID refers to the task you would like to train and FOLD is the fold of the cross-validation.
GPUS (integer value) specifies the number of GPUs you wish to train on. To specify which GPUs you want to use, please
make use of the
CUDA_VISIBLE_DEVICES envorinment variable to specify the GPU ids (specify as many as you configure with -gpus GPUS).
--dbs, if set, will distribute the batch size across GPUs. So if nnUNet configures a batch size of 2 and you run on 2
GPUs
, each GPU will run with a batch size of 1. If you omit --dbs, each GPU will run with the full batch size (2 for each
GPU
in this example for a total of batch size 4).
To run the DDP training you must have nnU-Net installed as a framework. Your current working directory must be the
nnunet folder (the one that has the dataset_conversion, evaluation, experiment_planning, ... subfolders!). You can then
run
the DDP training with the following command:
```bash
CUDA_VISIBLE_DEVICES=0,1,2... python -m torch.distributed.launch --master_port=XXXX --nproc_per_node=Y run/run_training_DDP.py CONFIGURATION nnUNetTrainerV2_DDP TASK_NAME_OR_ID FOLD --dbs
```
XXXX must be an open port for process-process communication (something like 4321 will do on most systems). Y is the
number of GPUs you wish to use. Remember that we do not (yet) support distributed training across compute nodes. This
all happens on the same system. Again, you can use CUDA_VISIBLE_DEVICES=0,1,2 to control what GPUs are used.
If you run more than one DDP training on the same system (say you have 4 GPUs and you run two training with 2 GPUs each)
you need to specify a different --master_port for each training!
*IMPORTANT!*
Multi-GPU training results in models that cannot be used for inference easily (as said above, all of this is
experimental ;-) ).
After finishing the training of all folds, run `nnUNet_change_trainer_class` on the folder where the trained model is
(see `nnUNet_change_trainer_class -h` for instructions). After that you can run inference.
### Identifying the best U-Net configuration
Once all models are trained, use the following
command to automatically determine what U-Net configuration(s) to use for test set prediction:
```bash
nnUNet_find_best_configuration -m 2d 3d_fullres 3d_lowres 3d_cascade_fullres -t XXX --strict
```
(all 5 folds need to be completed for all specified configurations!)
On datasets for which the cascade was not configured, use `-m 2d 3d_fullres` instead. If you wish to only explore some
subset of the configurations, you can specify that with the `-m` command. We recommend setting the
`--strict` (crash if one of the requested configurations is
missing) flag. Additional options are available (use `-h` for help).
### Run inference
Remember that the data located in the input folder must adhere to the format specified
[here](documentation/data_format_inference.md).
`nnUNet_find_best_configuration` will print a string to the terminal with the inference commands you need to use.
The easiest way to run inference is to simply use these commands.
If you wish to manually specify the configuration(s) used for inference, use the following commands:
For each of the desired configurations, run:
```
nnUNet_predict -i INPUT_FOLDER -o OUTPUT_FOLDER -t TASK_NAME_OR_ID -m CONFIGURATION --save_npz
```
Only specify `--save_npz` if you intend to use ensembling. `--save_npz` will make the command save the softmax
probabilities alongside of the predicted segmentation masks requiring a lot of disk space.
Please select a separate `OUTPUT_FOLDER` for each configuration!
If you wish to run ensembling, you can ensemble the predictions from several configurations with the following command:
```bash
nnUNet_ensemble -f FOLDER1 FOLDER2 ... -o OUTPUT_FOLDER -pp POSTPROCESSING_FILE
```
You can specify an arbitrary number of folders, but remember that each folder needs to contain npz files that were
generated by `nnUNet_predict`. For ensembling you can also specify a file that tells the command how to postprocess.
These files are created when running `nnUNet_find_best_configuration` and are located in the respective trained model
directory (
RESULTS_FOLDER/nnUNet/CONFIGURATION/TaskXXX_MYTASK/TRAINER_CLASS_NAME__PLANS_FILE_IDENTIFIER/postprocessing.json or
RESULTS_FOLDER/nnUNet/ensembles/TaskXXX_MYTASK/ensemble_X__Y__Z--X__Y__Z/postprocessing.json). You can also choose to
not provide a file (simply omit -pp) and nnU-Net will not run postprocessing.
Note that per default, inference will be done with all available folds. We very strongly recommend you use all 5 folds.
Thus, all 5 folds must have been trained prior to running inference. The list of available folds nnU-Net found will be
printed at the start of the inference.
## How to run inference with pretrained models
Trained models for all challenges we participated in are publicly available. They can be downloaded and installed
directly with nnU-Net. Note that downloading a pretrained model will overwrite other models that were trained with
exactly the same configuration (2d, 3d_fullres, ...), trainer (nnUNetTrainerV2) and plans.
To obtain a list of available models, as well as a short description, run
```bash
nnUNet_print_available_pretrained_models
```
You can then download models by specifying their task name. For the Liver and Liver Tumor Segmentation Challenge,
for example, this would be:
```bash
nnUNet_download_pretrained_model Task029_LiTS
```
After downloading is complete, you can use this model to run [inference](#run-inference). Keep in mind that each of
these models has specific data requirements (Task029_LiTS runs on abdominal CT scans, others require several image
modalities as input in a specific order).
When using the pretrained models you must adhere to the license of the dataset they are trained on! If you run
`nnUNet_download_pretrained_model` you will find a link where you can find the license for each dataset.
## Examples
To get you started we compiled two simple to follow examples:
- run a training with the 3d full resolution U-Net on the Hippocampus dataset.
See [here](documentation/training_example_Hippocampus.md).
- run inference with nnU-Net's pretrained models on the Prostate dataset.
See [here](documentation/inference_example_Prostate.md).
Usability not good enough? Let us know!
# Extending or Changing nnU-Net
Please refer to [this](documentation/extending_nnunet.md) guide.
# Information on run time and potential performance bottlenecks.
We have compiled a list of expected epoch times on standardized datasets across many different GPUs. You can use them
to verify that your system is performing as expected. There are also tips on how to identify bottlenecks and what
to do about them.
Click [here](documentation/expected_epoch_times.md).
# Common questions and issues
We have collected solutions to common [questions](documentation/common_questions.md) and
[problems](documentation/common_problems_and_solutions.md). Please consult these documents before you open a new issue.
--------------------
<img src="HIP_Logo.png" width="512px" />
nnU-Net is developed and maintained by the Applied Computer Vision Lab (ACVL) of
the [Helmholtz Imaging Platform](http://helmholtz-imaging.de).