Spaces:

ho11laqe
/

nnUNet_calvingfront_detection

Sleeping

App Files Files Community

nnUNet_calvingfront_detection / readme.md

ho11laqe

init

ecf08bc about 1 year ago

preview code

raw history blame

No virus

53.5 kB

	# Multi-Task Learning for Glacier Segmentation and Calving Front Prediction with the nnU-Net.

	This project contains the script for the experiments that are described in the paper "Out-of-the-box calving front detection method using deep-learning" by Herrmann et al. https://tc.copernicus.org/preprints/tc-2023-34/
	The project was build up on the nnU-Net project by Isensee, F., Jaeger, P. F. (2020) https://github.com/MIC-DKFZ/nnUNet. The folders that are new to the project are marked as "xx_new". I tried to change a minimum number of original files and create new ones, but it was no always feasible.

	## Out-of-the-box claving front detection
	To apply the trained nnU-Net on a set of SAR images for claving front detection, follow the steps below:

	1. Download this repository extract the files https://github.com/ho11laqe/nnUNet_calvingfront_detection.git
	2. Download the pretrained model from Zenodo and extract the zip-file https://zenodo.org/record/7837300#.ZD1OI9IzbUA.
	3. Install the repository
	- Create a new virtual environment with `python3 -m venv /path/to/venv/nnunet` and repace the path with the location,
	where the virtual environment should be installed.
	- Activate the environment with `source /path/to/venv/nnunet/bin/activate`.
	- Install the repository by entering `pip install -e /path/to/extraced/repository/nnunet_clavingfront` and replace the path.
	7. Run the calving front prediction with `bash RUN_CALBINGFRONT_DETECTION.sh -d /path/to/SARimages/ -m /path/to/pretrained/model/` and replace the paths
	with the path to the folder containing the SAR images and path to the pretrained model.

	## 1. Dataset

	The dataset is provided by Gourmelon et al. and can be found [here](https://doi.pangaea.de/10.1594/PANGAEA.940950).
	It contains 681 SAR images of seven glaciers taken by seven different satellites. Two glaciers are located in the
	northern hemisphere and five in the southern hemisphere. The two glaciers on the southern hemisphere are the Columbia
	Glacier in Alaska and the Jacobshavn in Greenland. Both glaciers are
	famous representatives of their regions because they are two of the largest tidewater glaciers in the world. The
	Columbia Glacier has a length of 51 km, and a thickness of 550 m. The glacier has been retreating at a rate of
	approximately 0.6 km per year since 1982. Jacobshaven has a length of 65 km, a thickness of
	2000 m, and retreated 27.4 km between 1902 and 2010. The five glaciers in the southern
	hemisphere are all located at the Arctic Penisula.
	<img src="Figures/studysites.png" width="1024px" />
	<img src="Figures/dataset_numbers.png" width="1024px" />
	Properties of the dataset including list of captured glaciers train-test-split,
	number of images per glacier, and covered area in km.

	The dataset contains two labels for each glacier image. One is a mask of the different
	zones of the glacier (ocean, glacier, radar shadow, rock). The other label contains a 1 pixel
	wide line representing the calving front. A sample of each glacier in the training set with
	its corresponding labels is shown in Figure 2. Predicting the zone mask can be seen as a
	classic segmentation problem. The only specialty is that all pixels are associated with a
	specific class so that there is no general ’background’-class for unclassifiable pixels. Because
	of the high-class imbalance, the calving front delineation is a more difficult task. Fewer
	than 1 % of the pixels are labeled as the front. Additionally, the structure of the class
	region is not a convex hull but a thin line.
	<img src="Figures/dataset.png" width="1024px" />
	Figure 2: Sample images of every glacier in the train set and their corresponding labels.
	The first row shows the front label with black background and a 1 pixel wide white line
	representing the calving front. The second row contains the zone labels with four classes:
	ocean (white), glacier (light gray), rock (dark gray), radar shadow (black).

	Every glacier is captured by multiple satellites for a higher temporal resolution. Mean-
	ing, that recordings of one glacier are captured by different SAR systems with different
	resolutions. In Figure 3 a timeline of the images of each glacier visualizes the observation
	time and frequency of the images. The first two rows show the glacier of the test set.
	<img src="Figures/satellites.png" width="1024px" />
	Figure 3: Distribution of the dataset’s images over time. The samples are grouped by the
	seven glaciers, and colored according to the capturing satellite.

	## 2. nnU-Net
	The nnU-Net by Fabian Isensee et al. [Ise+21] reduces the hyperparameter
	search by taking a fingerprint of the dataset and adjusting hyperparameters accordingly.
	Additionally, there are fixed parameters. These parameters are based on the authors’
	experience and generalize well across a variety of tasks. The structure of the nnU-Net is
	visualized in Figure 4.
	<img src="Figures/nnUnet.png" width="1024px" />
	Figure 4: Illustration of the nnU-Net framework created by Isensee et al. [Ise+21]

	I retraced the pipline of the nnU-Net and created the following visualizations. Figure 5 show the whole pipeline
	including the added python scripts. The data samples and the labels have to be in the Neuroimaging Informatics Technology
	Initiative (NIfTI) file format, separated into test and training samples. The NIfTI file format
	was developed for neuroimaging. The files store 3D scans of brains or other organs. The
	format stores additional information about the orientation of the data, distances between
	the individual pixels/voxels, and layers. Because the nnU-Net was developed for medical
	imaging, it uses this file format.
	<img src="Figures/Pipeline.png" width="1024px" />
	Figure 5: Scripts for conversion between PNG and NIfTI (blue), nnU-Net scripts (purple),
	evaluation scripts (green).

	### 2.1 Preprocessing

	The nnU-Net crops borders before the
	dataset fingerprint is created. While the dataset is perused for black borders, properties of
	every image, including size and spacing, are stored. After the cropping, every sample and
	its corresponding label is stacked into one NIfTI file. Finally, the dataset’s fingerprint is
	created by analyzing the dataset. The fingerprint includes the size and spacing of every
	sample, the number of classes, the imaging modality, and the intensity distribution.
	Based on the dataset’s fingerprint and the available Random Access Memory (RAM)
	of the Graphics Processing Unit (GPU), a plan for the training and architecture of the
	U-Net is created. The hyperparameters concerning the architecture is the patch size and
	the number of layers. Most often, using the whole data sample as the input for the U-Net
	results in a massive number of parameters and will not fit on traditional GPUs. Therefore,
	the image is divided into smaller parts called patches. Their segmentation mask is stitched
	together afterwards to get a segmentation of the whole image. The patch size is initialized
	with the median image shape and iteratively reduced until at least two images can be
	processed in parallel. The number of images passed through the network in parallel is called
	batch size and provides a more stable training. Here a larger patch size is preferred over a
	larger batch size to provide more contextual information for the segmentation. The patch
	size also represents the size of the first and last layer of the U-Net.
	<img src="Figures/preprocessing.png" width="1024px" />
	Figure 6: Plan and preprocessing pipeline including generated files. The python scripts
	are reduced to the important functions.

	### 2.2. Training
	Before the training starts, the network trainer and the network have to be initialized with
	the parameters generated by the previous step. The trainer’s parameters are learning rate,
	loss function, the maximum number of epochs, optimizer, and dataloader. The dataloader
	is responsible for creating the patches, batches, and augmentation of the samples. There
	are 11 augmentation steps in the nnU-Net listed in the Table below.
	<img src="Figures/augmentation.png" width="1024px" />

	In the next step, the network is created based on the generated parameters. The U-Net
	consists of multiple blocks (in this work: nine encoder blocks and eight decoder blocks).
	The encoder block and the decoder block are illustrated in Figure 7. The encoder block
	contains two convolutional layers. Each block is followed by an instance normalization and
	the activation function (leaky rectified linear unit). For instance normalization, the mean
	and variance are calculated for every feature map. Afterwards, it is subtracted by the mean
	and divided by the variance. The decoder block takes as input the output of the previous
	block and the output of the corresponding encoder block. The output of the previous block
	is scaled up with a transpose convolution and then concatenated with the encoder output.
	Then the decoder block is equally structured with the encoder block. The output is used
	by the next layer and the deep supervision part.
	<img src="Figures/unetblocks.png" width="1024px" />
	Figure 7: Illustration of the encoder and decoder blocks that make up the architecture of
	the nnU-Net. The encoder and the decoder contain multiple blocks.

	After everything is initialized, the network is trained to minimize the loss function.
	The loss function of the nnU-Net is the summation of the cross entropy loss. Typically one epoch
	corresponds to feeding every dataset sample to the network. The nnU-Net sets a fixed
	number of iterations (250) to be one epoch. Because the network ensures that at least
	one-third of samples contain a randomly chosen foreground class, this is especially helpful
	for the class imbalance of the front label, where most of the patches do not contain any
	class.
	<img src="Figures/training2.svg" width="1024px" />

	### 2.3 Post-processing
	The trained model can be used to detect the target in unseen data. First, the model files
	are loaded from the specified fold. Afterwards, the network preprocesses the hold-out test
	set. The test samples are divided into patches similar to the training samples. For a robust
	result, the patches are rotated three times, and the three resulting predictions are then
	combined by averaging the pixel values. The network accuracy decreased towards the
	borders of patches, therefore the predictions are weighted by a Gaussian bell. Finally, the
	patches overlap by half of the patch size to get a smoother final result and stored as NIfTI
	files in the specified folder. The inference script and its steps are illustrated in Figure 3.11.
	<img src="Figures/postprocessing.png" width="1024px" />


	## 3. Adjustments of the nnU-Net Pipeline
	There are mainly two approaches that can
	be distinguished. The approach that requires a minimum change to the vanilla U-Net is
	created by adding the second label as a second channel to the last layer (late-branching).
	Only the parameters for an additional kernel in the last layer need to be trained additionally.
	The total number of parameters that need to be trained changes insignificantly. The second
	approach uses one decoder for every label (early-branching).
	<img src="Figures/early_late_branching.png" width="1024px" />

	The PNGs of glaciers had to be converted to the NIfTI file format. Because the glacier
	labels are 2D, the two labels were stacked in one label file with the label of the front located
	at z = 0 and the zone masks at z = 1. In the dataset.json, which contains the dataset’s
	metadata, the label entry contains a list of multiple labels with multiple classes instead of a
	single list of classes. After the dataset is in the desired directory and format, the nnU-Net
	scripts can be executed.
	Changes in the preprocessing concern mainly the added dimension of the labels in
	dataset.json. Meaning, there are now multiple labels, each with multiple classes. And
	not one label with multiple classes. During the planning of the experiment, the es-
	timated size of the network architecture is requested. This work implements a new
	class Generic_UNet_MTLearly, which returns the network’s size in a method (com-
	pute_approx_vram_consumption). For comparison, this value is also used for the late-
	branching network, even if its size is small. Otherwise, early and late branching networks
	would be trained on different patch sizes. Generic_UNet_MTLearly is derived from the
	given class Generic_UNet, which was included in the framework and is used in this work
	for the single task segmentation. The Generic_UNet_MTLearly contains a second decoder, which is created in the initialization of every instance of the class and used in the
	forward-function. The outputs of both decoders is concatenated before returned.
	Another class is responsible for the training of the network. The given nnUNetTrainerV2
	was used for the single task segmentation. For the MTL a new nnUNetTrainerMTLearly
	and nnUNetTrainerMTLlate were derived from the single task trainer. These trainer
	classes contain hyperparameters, e.g., a maximum number of epochs and deep supervision
	scales. They also trigger the initialization the network, run feedforward, compute the loss
	and trigger the update of the weights. The initialization of the network is done in the
	aforementioned Generic_UNet classes. For early-branching, the last layer and the layer
	for deep supervision are modified to create two channel outputs. For lat e-branching, the
	decoder is duplicated, and the results of the decoders are concatenated before the return of
	the feedforward. After every iteration, the error of both labels is calculated as described in
	Section 3.6.2 and summed up with an equal weighting (unitary scalarization).
	Only minor changes had to be made in the inference script (predict_simple.py). After
	the test samples are divided into patches and fed through the network, multiple channels
	of the network’s output had to be separated and the patch predictions are composed to
	the prediction of the whole image. A list of the edited and added scripts is provided in the following table.

	<img src="Figures/listoffiles.png" width="700" />

	## 4. Experiments
	The first research goal is to apply the out-of-the-box nnU-Net, how it is intended to
	be used, on the glacier front detection and on the glacier zone segmentation, which is
	represented by the first two columns in the Figure 5.1. 5-fold cross-validation is used to
	eliminate the bias of the weight initialization, and the bias of the data split into training
	and validation sets. Every column in Figure 5.1 represents five newly trained U-Nets,
	with common hyperparameters but different weight initialization. The evaluations of the
	individual models are averaged to get a robust measure independent of weight initialization
	and data split.
	<img src="Figures/experiments.png" width="1024" />
	raining the nnU-Net directly on the front labels is the most straightforward approach to
	get a method for calving front detection. The label of the calving front is dilated to the width
	of five pixels. In provisional experiments the dilation has shown to make the predictions
	more robust. For the training with zone labels, the evaluation script includes extraction of
	the boundary between ocean and glacier, which is described in more detail in Section 5.2.
	The following approach is to train the U-Net on the zone and front label simultaneously.
	Two architectures are compared. The early-branching and the late-branching network are
	described in Section 4.1. The fifth experiment of this work extracts the boundaries of the
	glacier zone to all other zones as a third segmentation task for the U-Net (see Figure 5.2).
	In contrast to the calving front segmentation, which is a particular part of the boundary
	between glacier and ocean. The label of the glacier boundaries was extracted from the zone
	label. All glacier pixels with a neighbouring rock or shadow pixel are assigned as glacier
	boundary. The hypothesis is that providing more information about the same domain
	benefits the performance of the U-Net on the individual task. The last experiment fuses
	the two labels by creating a fourth class in the zone label associated with the glacier front.
	Because the front line has a width of five pixels, the other zone classes are merely impaired.

	After the samples are converted and stored in the correct directory, the first nnU-Net
	script reprocesses the data, takes a fingerprint of the dataset, and generates a corresponding
	plan for the training. The training plan contains the number of layers, kernel size for every
	convolutional layer, patch, and batch size. For the glacier dataset and a NVIDIA RTX 3080
	with 12GB memory, the resulting network architecture has nine encoder blocks and eight
	decoder blocks (see Figure 3.8). Considering that every block has two convolutional layers,
	the network architecture is relatively deep compared to the U-net presented in [Ron+15].
	Deep networks usually suffer from vanishing gradients. Vanishing gradients is avoided
	in this U-Net with 34 convolutional layers using deep supervisions. Deep supervision is
	explained in more detail in Section 3.6.2. The kernels of all convolutional layers have the
	size of 3x3. During training, one batch contains two images. Each image has a patch size
	of 1024 x 896 pixels. The second nnU-Net script trains the network and stores the trained
	models. The U-Net is trained with an SGD optimizer with an initial learning rate of 0.01,
	a Nesterov momentum of 0.99, and a weight decay of 3e-5. Training of one epoch took
	between 100 s and 160 s. The nnU-Net uses early-stopping, but due to limited resources
	the maximum number of epochs (500) is reached in every training. The common way to
	define one epoch to iterate over every sample of the training set. The nnU-Net uses a fixed
	number of iterations (250) to define one epoch. In iteration the batch is sampled depending
	on the class distribution of the sample to counteract class-imbalance. After the training,
	the final model is used to predict the test set. The predictions of the test set are stored in
	NIfTI files. After the test predictions are converted back to PNG, the results are evaluated.

	The first nnU-Net script reprocesses the data, takes a fingerprint of the dataset, and generates a corresponding
	plan for the training. The training plan contains the number of layers, kernel size for every
	convolutional layer, patch, and batch size. For the glacier dataset and a NVIDIA RTX 3080
	with 12GB memory, the resulting network architecture has nine encoder blocks and eight
	decoder blocks. Considering that every block has two convolutional layers,
	the network architecture is relatively deep compared to the U-net presented in [Ron+15].
	Deep networks usually suffer from vanishing gradients. Vanishing gradients is avoided
	in this U-Net with 34 convolutional layers using deep supervisions. Deep supervision is
	explained in more detail in Section 3.6.2. The kernels of all convolutional layers have the
	size of 3x3. During training, one batch contains two images. Each image has a patch size
	of 1024 x 896 pixels. The second nnU-Net script trains the network and stores the trained
	models. The U-Net is trained with an SGD optimizer with an initial learning rate of 0.01,
	a Nesterov momentum of 0.99, and a weight decay of 3e-5. Training of one epoch took
	between 100 s and 160 s. The nnU-Net uses early-stopping, but due to limited resources
	the maximum number of epochs (500) is reached in every training. The common way to
	define one epoch to iterate over every sample of the training set. The nnU-Net uses a fixed
	number of iterations (250) to define one epoch. In iteration the batch is sampled depending
	on the class distribution of the sample to counteract class-imbalance. After the training,
	the final model is used to predict the test set. The predictions of the test set are stored in
	NIfTI files. After the test predictions are converted back to PNG, the results are evaluated.
	A visualization of the training progress of third experiment with a late-branching
	architecture is shown in the gif below The gif shows a random sample of the training set.
	The predictions of the nnU-Net after different numbers of epochs are superimposed on the
	input image. In epoch 0 The classes
	are randomly assigned to the pixels. This leads to a noisy pattern of where all classes are
	equally distributed.The third and last nnU-Net script executes the inference. After a few epochs
	the class distributions of the prediction
	is already close to the target distribution. A small number pixels is classified as the glacier
	front and large number of pixels classified as the glacier. The ocean classifications are large
	clusters but some of them are falsely located in the glacier zone. In the end the calving front
	and the ocean is classified correctly
	only some parts of the glacier are classified as rock and vice versa. Visually, the predictions
	are similar to the target

	<img src="create_plots_new/output/overlay.gif" width="412" /> <img src="create_plots_new/output/target (copy).png" width="412" />

	The evaluation metric measures how accurate, precise, and robust the method detects the
	position of the calving front. Additionally, the precision of the glacier zone segmentation is
	meaningful information. The mean distance between the front pixels of the label and the
	front pixels of the prediction is used to evaluate the calving front detection. For every pixel
	in the label front Y , the distance to the closest pixel in the predicted front X is determined.
	Additionally, the distance to its closest pixel in the predicted front is determined for every
	pixel in the label front. Both distances are averaged and taken as the mean distance between
	the two lines.
	<img src="Figures/evaluation.png" width="700" />
	## 5. Results
	The evaluation metrics described above, show that both tasks achieve higher
	accuracy with MTL compared to Single-Task Learning (STL). In Figure 6.1 the front
	delineation error of every experiment is compared. The STL approach that is trained on
	front label has a front delineation error of 1103 ± 72 m and the STL approach that is trained
	on the zone label has a front delineation error of 1184 ± 225 m. The difference between the
	STL experiments is that the variance of the performance of the trained model is higher
	when trained on the zone labels.
	<img src="Figures/fronterror.png" width="1024" />

	<img src="Figures/season.png" width="1024" />
	The distribution of the test set prediction is plotted in Figure 6.4. In the first row, all
	122 test samples are drawn as dots. The median is the middle line in the orange rectangle,
	and the dashed line represents the mean. The x-axis has a logarithmic scale. Otherwise, the
	outliers would dominate the plot. The rectangle reaches from the first quartile to the third
	quartile. Each quartile contains 25 % of the data points. The rows below represent the
	samples captured during different seasons. The test set contains two glaciers: Mapple and
	COL. The glaciers are located on different hemispheres, therefore the winter and summer
	months are different for each glacier. Winter in the northern hemisphere is from October
	to March, and winter in the southern hemisphere is from April to August. The mean of the
	front prediction of the samples captured during summer have higher precision 458 ± 1060 m
	than the samples captured during the winter months 996 ± 1683 m. However, the medians
	are more similar with 133 m in the summer month and 185 m in the winter month.

	<img src="Figures/glacier.png" width="1024" />
	In this Figure the distribution of the prediction is divided into the two glaciers. The
	front delineation error for the calving front of Mapple is, on average 127 ± 107 m while
	the mean error of COL is 1184 ± 1761 m. This is caused by a group of predictions with an
	error > 2000 m. The median value is 275 m for COL and 97 m for Mapple.

	<img src="Figures/satellite.png" width="1024" />
	In this Figure the front delineation error is grouped by satellite. The predictions of
	samples created by ERS, ENVISAT, PALSAR, and TDX have an similar average error
	between 150 m and 300 m. The prediction of samples created by TSX are more precise
	with 68 ± 59 m and the error on samples created by S1 are less precise with 2060 ± 2139 m.
	Most test samples are captured by TSX, TDX and S1. TSX and TDX have a resolution of
	6 − 7 m, while S1 has a resolution of 20 m.

	<img src="Figures/results.png" width="1024" />
	Calving front prediction of COL on 3.9.2011, 22.6.2014, and 11.2.2016 taken by
	TDX with 7 m2/pixel resolution; label (blue), prediction (yellow), overlap (magenta).
	<img src="Figures/results_mapple.png" width="1024" />

	(a) Glacier images taken by ERS (20 m2/pixel)
	on 5.2.2007, 20.3.2010, and 8.9.2017.
	(b) Glacier images taken by TSX (7 m2/pixel)
	on 4.11.2008, 2.11.2009, and 2.8.2013.
	Figure 6.9: Calving front prediction of Mapple Glacier; label (blue), prediction (yellow),
	overlap (magenta), bounding box (cyan).

	All plots are generated by the files in the directory create_plots_new or by hand.

	# vvv Readme of the original git project vvv

	[2020_10_21] Update: We now have documentation for [common questions](documentation/common_questions.md) and
	[common issues](documentation/common_problems_and_solutions.md). We now also provide [reference epoch times for
	several datasets and tips on how to identify bottlenecks](documentation/expected_epoch_times.md).

	Please read these documents before opening a new issue!

	# nnU-Net

	In 3D biomedical image segmentation, dataset properties like imaging modality, image sizes, voxel spacings, class
	ratios etc vary drastically.
	For example, images in
	the [Liver and Liver Tumor Segmentation Challenge dataset](https://competitions.codalab.org/competitions/17094)
	are computed tomography (CT) scans, about 512x512x512 voxels large, have isotropic voxel spacings and their
	intensity values are quantitative (Hounsfield Units).
	The [Automated Cardiac Diagnosis Challenge dataset](https://acdc.creatis.insa-lyon.fr/) on the other hand shows cardiac
	structures in cine MRI with a typical image shape of 10x320x320 voxels, highly anisotropic voxel spacings and
	qualitative intensity values. In addition, the ACDC dataset suffers from slice misalignments and a heterogeneity of
	out-of-plane spacings which can cause severe interpolation artifacts if not handled properly.

	In current research practice, segmentation pipelines are designed manually and with one specific dataset in mind.
	Hereby, many pipeline settings depend directly or indirectly on the properties of the dataset
	and display a complex co-dependence: image size, for example, affects the patch size, which in
	turn affects the required receptive field of the network, a factor that itself influences several other
	hyperparameters in the pipeline. As a result, pipelines that were developed on one (type of) dataset are inherently
	incomaptible with other datasets in the domain.

	**nnU-Net is the first segmentation method that is designed to deal with the dataset diversity found in the domain. It
	condenses and automates the keys decisions for designing a successful segmentation pipeline for any given dataset.**

	nnU-Net makes the following contributions to the field:

	1. Standardized baseline: nnU-Net is the first standardized deep learning benchmark in biomedical segmentation.
	Without manual effort, researchers can compare their algorithms against nnU-Net on an arbitrary number of datasets
	to provide meaningful evidence for proposed improvements.
	2. Out-of-the-box segmentation method: nnU-Net is the first plug-and-play tool for state-of-the-art biomedical
	segmentation. Inexperienced users can use nnU-Net out of the box for their custom 3D segmentation problem without
	need for manual intervention.
	3. Framework: nnU-Net is a framework for fast and effective development of segmentation methods. Due to its modular
	structure, new architectures and methods can easily be integrated into nnU-Net. Researchers can then benefit from its
	generic nature to roll out and evaluate their modifications on an arbitrary number of datasets in a
	standardized environment.

	For more information about nnU-Net, please read the following paper:

	Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2020). nnU-Net: a self-configuring method
	for deep learning-based biomedical image segmentation. Nature Methods, 1-9.

	Please also cite this paper if you are using nnU-Net for your research!

	# Table of Contents

	- [Installation](#installation)
	- [Usage](#usage)
	* [How to run nnU-Net on a new dataset](#how-to-run-nnu-net-on-a-new-dataset)
	+ [Dataset conversion](#dataset-conversion)
	+ [Experiment planning and preprocessing](#experiment-planning-and-preprocessing)
	+ [Model training](#model-training)
	- [2D U-Net](#2d-u-net)
	- [3D full resolution U-Net](#3d-full-resolution-u-net)
	- [3D U-Net cascade](#3d-u-net-cascade)
	* [3D low resolution U-Net](#3d-low-resolution-u-net)
	* [3D full resolution U-Net](#3d-full-resolution-u-net-1)
	- [Multi GPU training](#multi-gpu-training)
	+ [Identifying the best U-Net configuration](#identifying-the-best-u-net-configuration)
	+ [Run inference](#run-inference)
	* [How to run inference with pretrained models](#how-to-run-inference-with-pretrained-models)
	* [Examples](#examples)
	- [Extending/Changing nnU-Net](#extending-or-changing-nnu-net)
	- [Information on run time and potential performance bottlenecks.](#information-on-run-time-and-potential-performance-bottlenecks)
	- [Common questions and issues](#common-questions-and-issues)

	# Installation

	nnU-Net has been tested on Linux (Ubuntu 16, 18 and 20; centOS, RHEL). We do not provide support for other operating
	systems.

	nnU-Net requires a GPU! For inference, the GPU should have 4 GB of VRAM. For training nnU-Net models the GPU should have
	at
	least 10 GB (popular non-datacenter options are the RTX 2080ti, RTX 3080 or RTX 3090). Due to the use of automated mixed
	precision, fastest training times are achieved with the Volta architecture (Titan V, V100 GPUs) when installing pytorch
	the easy way. Since pytorch comes with cuDNN 7.6.5 and tensor core acceleration on Turing GPUs is not supported for 3D
	convolutions in this version, you will not get the best training speeds on Turing GPUs. You can remedy that by compiling
	pytorch from source
	(see [here](https://github.com/pytorch/pytorch#from-source)) using cuDNN 8.0.2 or newer. This will unlock Turing GPUs
	(RTX 2080ti, RTX 6000) for automated mixed precision training with 3D convolutions and make the training blistering
	fast as well. Note that future versions of pytorch may include cuDNN 8.0.2 or newer by default and
	compiling from source will not be necessary.
	We don't know the speed of Ampere GPUs with vanilla vs self-compiled pytorch yet - this section will be updated as
	soon as we know.

	For training, we recommend a strong CPU to go along with the GPU. At least 6 CPU cores (12 threads) are recommended. CPU
	requirements are mostly related to data augmentation and scale with the number of input channels. They are thus higher
	for datasets like BraTS which use 4 image modalities and lower for datasets like LiTS which only uses CT images.

	We very strongly recommend you install nnU-Net in a virtual environment.
	[Here is a quick how-to for Ubuntu.](https://linoxide.com/linux-how-to/setup-python-virtual-environment-ubuntu/)
	If you choose to compile pytorch from source, you will need to use conda instead of pip. In that case, please set the
	environment variable OMP_NUM_THREADS=1 (preferably in your bashrc using `export OMP_NUM_THREADS=1`). This is important!

	Python 2 is deprecated and not supported. Please make sure you are using Python 3.

	1) Install [PyTorch](https://pytorch.org/get-started/locally/). You need at least version 1.6
	2) Install nnU-Net depending on your use case:
	1) For use as standardized baseline, out-of-the-box segmentation algorithm or for running **inference with
	pretrained models**:

	```pip install nnunet```

	2) For use as integrative framework (this will create a copy of the nnU-Net code on your computer so that you
	can modify it as needed):
	```bash
	git clone https://github.com/MIC-DKFZ/nnUNet.git
	cd nnUNet
	pip install -e .
	```
	3) nnU-Net needs to know where you intend to save raw data, preprocessed data and trained models. For this you need to
	set a few of environment variables. Please follow the instructions [here](documentation/setting_up_paths.md).
	4) (OPTIONAL) Install [hiddenlayer](https://github.com/waleedka/hiddenlayer). hiddenlayer enables nnU-net to generate
	plots of the network topologies it generates (see [Model training](#model-training)). To install hiddenlayer,
	run the following commands:
	```bash
	pip install --upgrade git+https://github.com/FabianIsensee/hiddenlayer.git@more_plotted_details#egg=hiddenlayer
	```

	Installing nnU-Net will add several new commands to your terminal. These commands are used to run the entire nnU-Net
	pipeline. You can execute them from any location on your system. All nnU-Net commands have the prefix `nnUNet_` for
	easy identification.

	Note that these commands simply execute python scripts. If you installed nnU-Net in a virtual environment, this
	environment must be activated when executing the commands.

	All nnU-Net commands have a `-h` option which gives information on how to use them.

	A typical installation of nnU-Net can be completed in less than 5 minutes. If pytorch needs to be compiled from source
	(which is what we currently recommend when using Turing GPUs), this can extend to more than an hour.

	# Usage

	To familiarize yourself with nnU-Net we recommend you have a look at the [Examples](#Examples) before you start with
	your own dataset.

	## How to run nnU-Net on a new dataset

	Given some dataset, nnU-Net fully automatically configures an entire segmentation pipeline that matches its properties.
	nnU-Net covers the entire pipeline, from preprocessing to model configuration, model training, postprocessing
	all the way to ensembling. After running nnU-Net, the trained model(s) can be applied to the test cases for inference.

	### Dataset conversion

	nnU-Net expects datasets in a structured format. This format closely (but not entirely) follows the data structure of
	the [Medical Segmentation Decthlon](http://medicaldecathlon.com/). Please read
	[this](documentation/dataset_conversion.md) for information on how to convert datasets to be compatible with nnU-Net.

	### Experiment planning and preprocessing

	As a first step, nnU-Net extracts a dataset fingerprint (a set of dataset-specific properties such as
	image sizes, voxel spacings, intensity information etc). This information is used to create three U-Net configurations:
	a 2D U-Net, a 3D U-Net that operated on full resolution images as well as a 3D U-Net cascade where the first U-Net
	creates a coarse segmentation map in downsampled images which is then refined by the second U-Net.

	Provided that the requested raw dataset is located in the correct
	folder (`nnUNet_raw_data_base/nnUNet_raw_data/TaskXXX_MYTASK`,
	also see [here](documentation/dataset_conversion.md)), you can run this step with the following command:

	```bash
	nnUNet_plan_and_preprocess -t XXX --verify_dataset_integrity
	```

	`XXX` is the integer identifier associated with your Task name `TaskXXX_MYTASK`. You can pass several task IDs at once.

	Running `nnUNet_plan_and_preprocess` will populate your folder with preprocessed data. You will find the output in
	nnUNet_preprocessed/TaskXXX_MYTASK. `nnUNet_plan_and_preprocess` creates subfolders with preprocessed data for the 2D
	U-Net as well as all applicable 3D U-Nets. It will also create 'plans' files (with the ending.pkl) for the 2D and
	3D configurations. These files contain the generated segmentation pipeline configuration and will be read by the
	nnUNetTrainer (see below). Note that the preprocessed data folder only contains the training cases.
	The test images are not preprocessed (they are not looked at at all!). Their preprocessing happens on the fly during
	inference.

	`--verify_dataset_integrity` should be run at least for the first time the command is run on a given dataset. This will
	execute some
	checks on the dataset to ensure that it is compatible with nnU-Net. If this check has passed once, it can be
	omitted in future runs. If you adhere to the dataset conversion guide (see above) then this should pass without issues :
	-)

	Note that `nnUNet_plan_and_preprocess` accepts several additional input arguments. Running `-h` will list all of them
	along with a description. If you run out of RAM during preprocessing, you may want to adapt the number of processes
	used with the `-tl` and `-tf` options.

	After `nnUNet_plan_and_preprocess` is completed, the U-Net configurations have been created and a preprocessed copy
	of the data will be located at nnUNet_preprocessed/TaskXXX_MYTASK.

	Extraction of the dataset fingerprint can take from a couple of seconds to several minutes depending on the properties
	of the segmentation task. Pipeline configuration given the extracted finger print is nearly instantaneous (couple
	of seconds). Preprocessing depends on image size and how powerful the CPU is. It can take between seconds and several
	tens of minutes.

	### Model training

	nnU-Net trains all U-Net configurations in a 5-fold cross-validation. This enables nnU-Net to determine the
	postprocessing and ensembling (see next step) on the training dataset. Per default, all U-Net configurations need to
	be run on a given dataset. There are, however situations in which only some configurations (and maybe even without
	running the cross-validation) are desired. See [FAQ](documentation/common_questions.md) for more information.

	Note that not all U-Net configurations are created for all datasets. In datasets with small image sizes, the U-Net
	cascade is omitted because the patch size of the full resolution U-Net already covers a large part of the input images.

	Training models is done with the `nnUNet_train` command. The general structure of the command is:

	```bash
	nnUNet_train CONFIGURATION TRAINER_CLASS_NAME TASK_NAME_OR_ID FOLD --npz (additional options)
	```

	CONFIGURATION is a string that identifies the requested U-Net configuration. TRAINER_CLASS_NAME is the name of the
	model trainer. If you implement custom trainers (nnU-Net as a framework) you can specify your custom trainer here.
	TASK_NAME_OR_ID specifies what dataset should be trained on and FOLD specifies which fold of the 5-fold-cross-validaton
	is trained.

	nnU-Net stores a checkpoint every 50 epochs. If you need to continue a previous training, just add a `-c` to the
	training command.

	IMPORTANT: `--npz` makes the models save the softmax outputs during the final validation. It should only be used for
	trainings
	where you plan to run `nnUNet_find_best_configuration` afterwards
	(this is nnU-Nets automated selection of the best performing (ensemble of) configuration(s), see below). If you are
	developing new
	trainer classes you may not need the softmax predictions and should therefore omit the `--npz` flag. Exported softmax
	predictions are very large and therefore can take up a lot of disk space.
	If you ran initially without the `--npz` flag but now require the softmax predictions, simply run

	```bash
	nnUNet_train CONFIGURATION TRAINER_CLASS_NAME TASK_NAME_OR_ID FOLD -val --npz
	```

	to generate them. This will only rerun the validation, not the training.

	See `nnUNet_train -h` for additional options.

	#### 2D U-Net

	For FOLD in [0, 1, 2, 3, 4], run:

	```bash
	nnUNet_train 2d nnUNetTrainerV2 TaskXXX_MYTASK FOLD --npz
	```

	#### 3D full resolution U-Net

	For FOLD in [0, 1, 2, 3, 4], run:

	```bash
	nnUNet_train 3d_fullres nnUNetTrainerV2 TaskXXX_MYTASK FOLD --npz
	```

	#### 3D U-Net cascade

	##### 3D low resolution U-Net

	For FOLD in [0, 1, 2, 3, 4], run:

	```bash
	nnUNet_train 3d_lowres nnUNetTrainerV2 TaskXXX_MYTASK FOLD --npz
	```

	##### 3D full resolution U-Net

	For FOLD in [0, 1, 2, 3, 4], run:

	```bash
	nnUNet_train 3d_cascade_fullres nnUNetTrainerV2CascadeFullRes TaskXXX_MYTASK FOLD --npz
	```

	Note that the 3D full resolution U-Net of the cascade requires the five folds of the low resolution U-Net to be
	completed beforehand!

	The trained models will we written to the RESULTS_FOLDER/nnUNet folder. Each training obtains an automatically generated
	output folder name:

	nnUNet_preprocessed/CONFIGURATION/TaskXXX_MYTASKNAME/TRAINER_CLASS_NAME__PLANS_FILE_NAME/FOLD

	For Task002_Heart (from the MSD), for example, this looks like this:

	RESULTS_FOLDER/nnUNet/
	├── 2d
	│ └── Task02_Heart
	│ └── nnUNetTrainerV2__nnUNetPlansv2.1
	│ ├── fold_0
	│ ├── fold_1
	│ ├── fold_2
	│ ├── fold_3
	│ └── fold_4
	├── 3d_cascade_fullres
	├── 3d_fullres
	│ └── Task02_Heart
	│ └── nnUNetTrainerV2__nnUNetPlansv2.1
	│ ├── fold_0
	│ │ ├── debug.json
	│ │ ├── model_best.model
	│ │ ├── model_best.model.pkl
	│ │ ├── model_final_checkpoint.model
	│ │ ├── model_final_checkpoint.model.pkl
	│ │ ├── network_architecture.pdf
	│ │ ├── progress.png
	│ │ └── validation_raw
	│ │ ├── la_007.nii.gz
	│ │ ├── la_007.pkl
	│ │ ├── la_016.nii.gz
	│ │ ├── la_016.pkl
	│ │ ├── la_021.nii.gz
	│ │ ├── la_021.pkl
	│ │ ├── la_024.nii.gz
	│ │ ├── la_024.pkl
	│ │ ├── summary.json
	│ │ └── validation_args.json
	│ ├── fold_1
	│ ├── fold_2
	│ ├── fold_3
	│ └── fold_4
	└── 3d_lowres

	Note that 3d_lowres and 3d_cascade_fullres are not populated because this dataset did not trigger the cascade. In each
	model training output folder (each of the fold_x folder, 10 in total here), the following files will be created (only
	shown for one folder above for brevity):

	- debug.json: Contains a summary of blueprint and inferred parameters used for training this model. Not easy to read,
	but very useful for debugging ;-)
	- model_best.model / model_best.model.pkl: checkpoint files of the best model identified during training. Not used right
	now.
	- model_final_checkpoint.model / model_final_checkpoint.model.pkl: checkpoint files of the final model (after training
	has ended). This is what is used for both validation and inference.
	- network_architecture.pdf (only if hiddenlayer is installed!): a pdf document with a figure of the network architecture
	in it.
	- progress.png: A plot of the training (blue) and validation (red) loss during training. Also shows an approximation of
	the evlauation metric (green). This approximation is the average Dice score of the foreground classes. It should,
	however, only to be taken with a grain of salt because it is computed on randomly drawn patches from the validation
	data at the end of each epoch, and the aggregation of TP, FP and FN for the Dice computation treats the patches as if
	they all originate from the same volume ('global Dice'; we do not compute a Dice for each validation case and then
	average over all cases but pretend that there is only one validation case from which we sample patches). The reason
	for
	this is that the 'global Dice' is easy to compute during training and is still quite useful to evaluate whether a
	model
	is training at all or not. A proper validation is run at the end of the training.
	- validation_raw: in this folder are the predicted validation cases after the training has finished. The summary.json
	contains the validation metrics (a mean over all cases is provided at the end of the file).

	During training it is often useful to watch the progress. We therefore recommend that you have a look at the generated
	progress.png when running the first training. It will be updated after each epoch.

	Training times largely depend on the GPU. The smallest GPU we recommend for training is the Nvidia RTX 2080ti. With
	this GPU (and pytorch compiled with cuDNN 8.0.2), all network trainings take less than 2 days.

	#### Multi GPU training

	Multi GPU training is experimental and NOT RECOMMENDED!

	nnU-Net supports two different multi-GPU implementation: DataParallel (DP) and Distributed Data Parallel (DDP)
	(but currently only on one host!). DDP is faster than DP and should be preferred if possible. However, if you did not
	install nnunet as a framework (meaning you used the `pip install nnunet` variant), DDP is not available. It requires a
	different way of calling the correct python script (see below) which we cannot support from our terminal commands.

	Distributed training currently only works for the basic trainers (2D, 3D full resolution and 3D low resolution) and not
	for the second, high resolution U-Net of the cascade. The reason for this is that distributed training requires some
	changes to the network and loss function, requiring a new nnUNet trainer class. This is, as of now, simply not
	implemented for the cascade, but may be added in the future.

	To run distributed training (DP), use the following command:

	```bash
	CUDA_VISIBLE_DEVICES=0,1,2... nnUNet_train_DP CONFIGURATION nnUNetTrainerV2_DP TASK_NAME_OR_ID FOLD -gpus GPUS --dbs
	```

	Note that nnUNetTrainerV2 was replaced with nnUNetTrainerV2_DP. Just like before, CONFIGURATION can be 2d, 3d_lowres or
	3d_fullres. TASK_NAME_OR_ID refers to the task you would like to train and FOLD is the fold of the cross-validation.
	GPUS (integer value) specifies the number of GPUs you wish to train on. To specify which GPUs you want to use, please
	make use of the
	CUDA_VISIBLE_DEVICES envorinment variable to specify the GPU ids (specify as many as you configure with -gpus GPUS).
	--dbs, if set, will distribute the batch size across GPUs. So if nnUNet configures a batch size of 2 and you run on 2
	GPUs
	, each GPU will run with a batch size of 1. If you omit --dbs, each GPU will run with the full batch size (2 for each
	GPU
	in this example for a total of batch size 4).

	To run the DDP training you must have nnU-Net installed as a framework. Your current working directory must be the
	nnunet folder (the one that has the dataset_conversion, evaluation, experiment_planning, ... subfolders!). You can then
	run
	the DDP training with the following command:

	```bash
	CUDA_VISIBLE_DEVICES=0,1,2... python -m torch.distributed.launch --master_port=XXXX --nproc_per_node=Y run/run_training_DDP.py CONFIGURATION nnUNetTrainerV2_DDP TASK_NAME_OR_ID FOLD --dbs
	```

	XXXX must be an open port for process-process communication (something like 4321 will do on most systems). Y is the
	number of GPUs you wish to use. Remember that we do not (yet) support distributed training across compute nodes. This
	all happens on the same system. Again, you can use CUDA_VISIBLE_DEVICES=0,1,2 to control what GPUs are used.
	If you run more than one DDP training on the same system (say you have 4 GPUs and you run two training with 2 GPUs each)
	you need to specify a different --master_port for each training!

	IMPORTANT!
	Multi-GPU training results in models that cannot be used for inference easily (as said above, all of this is
	experimental ;-) ).
	After finishing the training of all folds, run `nnUNet_change_trainer_class` on the folder where the trained model is
	(see `nnUNet_change_trainer_class -h` for instructions). After that you can run inference.

	### Identifying the best U-Net configuration

	Once all models are trained, use the following
	command to automatically determine what U-Net configuration(s) to use for test set prediction:

	```bash
	nnUNet_find_best_configuration -m 2d 3d_fullres 3d_lowres 3d_cascade_fullres -t XXX --strict
	```

	(all 5 folds need to be completed for all specified configurations!)

	On datasets for which the cascade was not configured, use `-m 2d 3d_fullres` instead. If you wish to only explore some
	subset of the configurations, you can specify that with the `-m` command. We recommend setting the
	`--strict` (crash if one of the requested configurations is
	missing) flag. Additional options are available (use `-h` for help).

	### Run inference

	Remember that the data located in the input folder must adhere to the format specified
	[here](documentation/data_format_inference.md).

	`nnUNet_find_best_configuration` will print a string to the terminal with the inference commands you need to use.
	The easiest way to run inference is to simply use these commands.

	If you wish to manually specify the configuration(s) used for inference, use the following commands:

	For each of the desired configurations, run:

	```
	nnUNet_predict -i INPUT_FOLDER -o OUTPUT_FOLDER -t TASK_NAME_OR_ID -m CONFIGURATION --save_npz
	```

	Only specify `--save_npz` if you intend to use ensembling. `--save_npz` will make the command save the softmax
	probabilities alongside of the predicted segmentation masks requiring a lot of disk space.

	Please select a separate `OUTPUT_FOLDER` for each configuration!

	If you wish to run ensembling, you can ensemble the predictions from several configurations with the following command:

	```bash
	nnUNet_ensemble -f FOLDER1 FOLDER2 ... -o OUTPUT_FOLDER -pp POSTPROCESSING_FILE
	```

	You can specify an arbitrary number of folders, but remember that each folder needs to contain npz files that were
	generated by `nnUNet_predict`. For ensembling you can also specify a file that tells the command how to postprocess.
	These files are created when running `nnUNet_find_best_configuration` and are located in the respective trained model
	directory (
	RESULTS_FOLDER/nnUNet/CONFIGURATION/TaskXXX_MYTASK/TRAINER_CLASS_NAME__PLANS_FILE_IDENTIFIER/postprocessing.json or
	RESULTS_FOLDER/nnUNet/ensembles/TaskXXX_MYTASK/ensemble_X__Y__Z--X__Y__Z/postprocessing.json). You can also choose to
	not provide a file (simply omit -pp) and nnU-Net will not run postprocessing.

	Note that per default, inference will be done with all available folds. We very strongly recommend you use all 5 folds.
	Thus, all 5 folds must have been trained prior to running inference. The list of available folds nnU-Net found will be
	printed at the start of the inference.

	## How to run inference with pretrained models

	Trained models for all challenges we participated in are publicly available. They can be downloaded and installed
	directly with nnU-Net. Note that downloading a pretrained model will overwrite other models that were trained with
	exactly the same configuration (2d, 3d_fullres, ...), trainer (nnUNetTrainerV2) and plans.

	To obtain a list of available models, as well as a short description, run

	```bash
	nnUNet_print_available_pretrained_models
	```

	You can then download models by specifying their task name. For the Liver and Liver Tumor Segmentation Challenge,
	for example, this would be:

	```bash
	nnUNet_download_pretrained_model Task029_LiTS
	```

	After downloading is complete, you can use this model to run [inference](#run-inference). Keep in mind that each of
	these models has specific data requirements (Task029_LiTS runs on abdominal CT scans, others require several image
	modalities as input in a specific order).

	When using the pretrained models you must adhere to the license of the dataset they are trained on! If you run
	`nnUNet_download_pretrained_model` you will find a link where you can find the license for each dataset.

	## Examples

	To get you started we compiled two simple to follow examples:

	- run a training with the 3d full resolution U-Net on the Hippocampus dataset.
	See [here](documentation/training_example_Hippocampus.md).
	- run inference with nnU-Net's pretrained models on the Prostate dataset.
	See [here](documentation/inference_example_Prostate.md).

	Usability not good enough? Let us know!

	# Extending or Changing nnU-Net

	Please refer to [this](documentation/extending_nnunet.md) guide.

	# Information on run time and potential performance bottlenecks.

	We have compiled a list of expected epoch times on standardized datasets across many different GPUs. You can use them
	to verify that your system is performing as expected. There are also tips on how to identify bottlenecks and what
	to do about them.

	Click [here](documentation/expected_epoch_times.md).

	# Common questions and issues

	We have collected solutions to common [questions](documentation/common_questions.md) and
	[problems](documentation/common_problems_and_solutions.md). Please consult these documents before you open a new issue.

	--------------------

	<img src="HIP_Logo.png" width="512px" />

	nnU-Net is developed and maintained by the Applied Computer Vision Lab (ACVL) of
	the [Helmholtz Imaging Platform](http://helmholtz-imaging.de).