|
# Dataset conversion instructions |
|
nnU-Net requires the raw data to be brought into a specific format so that it know how to read and interpret it. This |
|
format closely, but not entirely, follows the format used by the |
|
[Medical Segmentation Decathlon](http://medicaldecathlon.com/) (MSD). |
|
|
|
The entry point to nnU-Net is the nnUNet_raw_data_base folder (which the user needs to specify when installing nnU-Net!). |
|
Each segmentation dataset is stored as a separate 'Task'. Tasks are associated with a task ID, a three digit integer |
|
(this is different from the MSD!) and |
|
a task name (which you can freely choose): Task005_Prostate has 'Prostate' as task name and the task id is 5. Tasks are stored in the |
|
nnUNet_raw_data_base/nnUNet_raw_data folder like this: |
|
|
|
nnUNet_raw_data_base/nnUNet_raw_data/ |
|
βββ Task001_BrainTumour |
|
βββ Task002_Heart |
|
βββ Task003_Liver |
|
βββ Task004_Hippocampus |
|
βββ Task005_Prostate |
|
βββ ... |
|
|
|
Within each task folder, the following structure is expected: |
|
|
|
Task001_BrainTumour/ |
|
βββ dataset.json |
|
βββ imagesTr |
|
βββ (imagesTs) |
|
βββ labelsTr |
|
|
|
**Please make your custom task ids start at 500 to ensure that there will be no conflicts with downloaded pretrained models!!! (IDs cannot exceed 999)** |
|
|
|
imagesTr contains the images belonging to the training cases. nnU-Net will run pipeline configuration, training with |
|
cross-validation, as well as finding postprocesing and the best ensemble on this data. imagesTs (optional) contains the |
|
images that belong to the |
|
test cases , labelsTr the images with the ground truth segmentation maps for the training cases. dataset.json contains |
|
metadata of the dataset. |
|
|
|
Each training case is associated with an identifier = a unique name for that case. This identifier is used by nnU-Net to |
|
recognize which label file belongs to which image. **All images (including labels) must be 3D nifti files (.nii.gz)!** |
|
|
|
The image files can have any scalar pixel type. The label files must contain segmentation maps that contain consecutive integers, |
|
starting with 0: 0, 1, 2, 3, ... num_labels. 0 is considered background. Each class then has its own associated integer |
|
value. |
|
Images may have multiple modalities. This is especially often the case for medical images. Modalities are very much |
|
like color channels in photos (three color channels: red, green blue), but can be much more diverse: CT, different types |
|
or MRI, and many other. Imaging modalities are identified by nnU-Net by their suffix: a four-digit integer at the end |
|
of the filename. Imaging files must therefore follow the following naming convention: case_identifier_XXXX.nii.gz. |
|
Hereby, XXXX is the modality identifier. What modalities these identifiers belong to is specified in the dataset.json |
|
file (see below). Label files are saved as case_identifier.nii.gz |
|
|
|
This naming scheme results in the following folder structure. It is the responsibility of the user to bring their |
|
data into this format! |
|
|
|
Here is an example for the first Task of the MSD: BrainTumour. Each image has four modalities: FLAIR (0000), |
|
T1w (0001), T1gd (0002) and T2w (0003). Note that the imagesTs folder is optional and does not have to be present. |
|
|
|
nnUNet_raw_data_base/nnUNet_raw_data/Task001_BrainTumour/ |
|
βββ dataset.json |
|
βββ imagesTr |
|
βΒ Β βββ BRATS_001_0000.nii.gz |
|
βΒ Β βββ BRATS_001_0001.nii.gz |
|
βΒ Β βββ BRATS_001_0002.nii.gz |
|
βΒ Β βββ BRATS_001_0003.nii.gz |
|
βΒ Β βββ BRATS_002_0000.nii.gz |
|
βΒ Β βββ BRATS_002_0001.nii.gz |
|
βΒ Β βββ BRATS_002_0002.nii.gz |
|
βΒ Β βββ BRATS_002_0003.nii.gz |
|
βΒ Β βββ BRATS_003_0000.nii.gz |
|
βΒ Β βββ BRATS_003_0001.nii.gz |
|
βΒ Β βββ BRATS_003_0002.nii.gz |
|
βΒ Β βββ BRATS_003_0003.nii.gz |
|
βΒ Β βββ BRATS_004_0000.nii.gz |
|
βΒ Β βββ BRATS_004_0001.nii.gz |
|
βΒ Β βββ BRATS_004_0002.nii.gz |
|
βΒ Β βββ BRATS_004_0003.nii.gz |
|
βΒ Β βββ ... |
|
βββ imagesTs |
|
βΒ Β βββ BRATS_485_0000.nii.gz |
|
βΒ Β βββ BRATS_485_0001.nii.gz |
|
βΒ Β βββ BRATS_485_0002.nii.gz |
|
βΒ Β βββ BRATS_485_0003.nii.gz |
|
βΒ Β βββ BRATS_486_0000.nii.gz |
|
βΒ Β βββ BRATS_486_0001.nii.gz |
|
βΒ Β βββ BRATS_486_0002.nii.gz |
|
βΒ Β βββ BRATS_486_0003.nii.gz |
|
βΒ Β βββ BRATS_487_0000.nii.gz |
|
βΒ Β βββ BRATS_487_0001.nii.gz |
|
βΒ Β βββ BRATS_487_0002.nii.gz |
|
βΒ Β βββ BRATS_487_0003.nii.gz |
|
βΒ Β βββ BRATS_488_0000.nii.gz |
|
βΒ Β βββ BRATS_488_0001.nii.gz |
|
βΒ Β βββ BRATS_488_0002.nii.gz |
|
βΒ Β βββ BRATS_488_0003.nii.gz |
|
βΒ Β βββ BRATS_489_0000.nii.gz |
|
βΒ Β βββ BRATS_489_0001.nii.gz |
|
βΒ Β βββ BRATS_489_0002.nii.gz |
|
βΒ Β βββ BRATS_489_0003.nii.gz |
|
βΒ Β βββ ... |
|
βββ labelsTr |
|
βββ BRATS_001.nii.gz |
|
βββ BRATS_002.nii.gz |
|
βββ BRATS_003.nii.gz |
|
βββ BRATS_004.nii.gz |
|
βββ ... |
|
|
|
Here is another example of the second task of the MSD, which has only one modality: |
|
|
|
nnUNet_raw_data_base/nnUNet_raw_data/Task002_Heart/ |
|
βββ dataset.json |
|
βββ imagesTr |
|
βΒ Β βββ la_003_0000.nii.gz |
|
βΒ Β βββ la_004_0000.nii.gz |
|
βΒ Β βββ ... |
|
βββ imagesTs |
|
βΒ Β βββ la_001_0000.nii.gz |
|
βΒ Β βββ la_002_0000.nii.gz |
|
βΒ Β βββ ... |
|
βββ labelsTr |
|
βββ la_003.nii.gz |
|
βββ la_004.nii.gz |
|
βββ ... |
|
|
|
For each training case, all images must have the same geometry to ensure that their pixel arrays are aligned. Also |
|
make sure that all your data is co-registered! |
|
|
|
The dataset.json file used by nnU-Net is identical to the ones used by the MSD. For your custom tasks you need to create |
|
them as well and thereby exactly follow the same structure. [This](https://drive.google.com/drive/folders/1HqEgzS8BV2c7xYNrZdEAnrHk7osJJ--2) |
|
is where you can download the MSD data for reference. |
|
|
|
**NEW:** There now is a utility with which you can generate the dataset.json automatically. You can find it |
|
[here](../nnunet/dataset_conversion/utils.py) (look for the function `generate_dataset_json`). |
|
See [Task120](../nnunet/dataset_conversion/Task120_Massachusetts_RoadSegm.py) for an example on how to use it. And read |
|
its documentation! |
|
|
|
Here is the content of the dataset.json from the Prostate task: |
|
|
|
{ |
|
"name": "PROSTATE", |
|
"description": "Prostate transitional zone and peripheral zone segmentation", |
|
"reference": "Radboud University, Nijmegen Medical Centre", |
|
"licence":"CC-BY-SA 4.0", |
|
"relase":"1.0 04/05/2018", |
|
"tensorImageSize": "4D", |
|
"modality": { |
|
"0": "T2", |
|
"1": "ADC" |
|
}, |
|
"labels": { |
|
"0": "background", |
|
"1": "PZ", |
|
"2": "TZ" |
|
}, |
|
"numTraining": 32, |
|
"numTest": 16, |
|
"training":[{"image":"./imagesTr/prostate_16.nii.gz","label":"./labelsTr/prostate_16.nii.gz"},{"image":"./imagesTr/prostate_04.nii.gz","label":"./labelsTr/prostate_04.nii.gz"},...], |
|
"test": ["./imagesTs/prostate_08.nii.gz","./imagesTs/prostate_22.nii.gz","./imagesTs/prostate_30.nii.gz",...] |
|
} |
|
|
|
Note that we truncated the "training" and "test" lists for clarity. You need to specify all the cases in there. If you |
|
don't have test images (imagesTs does not exist) you can leave "test" blank: `"test": []`. |
|
|
|
Please also have a look at the python files located [here](../nnunet/dataset_conversion). They show how we created our |
|
custom dataset.jsons for a range of public datasets. |
|
|
|
## How to use decathlon datasets |
|
The previous release of nnU-Net allowed users to either start with 4D or 3D niftis. This resulted in some confusion, |
|
however, because some users would not know where they should save their data. We therefore dropped support for the 4D |
|
niftis used by the MSD. Instead, we provide a utility that converts the MSD datasets into the format specified above: |
|
|
|
```bash |
|
nnUNet_convert_decathlon_task -i FOLDER_TO_TASK_AS_DOWNLOADED_FROM_MSD -p NUM_PROCESSES |
|
``` |
|
|
|
FOLDER_TO_TASK_AS_DOWNLOADED_FROM_MSD needs to point to the downloaded task folder (such as Task05_Prostate, note the |
|
2-digit task id!). The converted Task will be saved under the same name in nnUNet_raw_data_base/nnUNet_raw_data |
|
(but with a 3 digit identifier). You can overwrite the task id of the converted task by using the `-output_task_id` option. |
|
|
|
|
|
## How to use 2D data with nnU-Net |
|
nnU-Net was originally built for 3D images. It is also strongest when applied to 3D segmentation problems because a |
|
large proportion of its design choices were built with 3D in mind. Also note that many 2D segmentation problems, |
|
especially in the non-biomedical domain, may benefit from pretrained network architectures which nnU-Net does not |
|
support. |
|
Still, there is certainly a need for an out of the box segmentation solution for 2D segmentation problems. And |
|
also on 2D segmentation tasks nnU-Net cam perform extremely well! We have, for example, won a 2D task in the cell |
|
tracking challenge with nnU-Net (see our Nature Methods paper) and we have also successfully applied nnU-Net to |
|
histopathological segmentation problems. |
|
Working with 2D data in nnU-Net requires a small workaround in the creation of the dataset. Essentially, all images |
|
must be converted to pseudo 3D images (so an image with shape (X, Y) needs to be converted to an image with shape |
|
(1, X, Y). The resulting image must be saved in nifti format. Hereby it is important to set the spacing of the |
|
first axis (the one with shape 1) to a value larger than the others. If you are working with niftis anyways, then |
|
doing this should be easy for you. This example here is intended for demonstrating how nnU-Net can be used with |
|
'regular' 2D images. We selected the massachusetts road segmentation dataset for this because it can be obtained |
|
easily, it comes with a good amount of training cases but is still not too large to be difficult to handle. |
|
|
|
See [here](../nnunet/dataset_conversion/Task120_Massachusetts_RoadSegm.py) for an example. |
|
This script contains a lot of comments and useful information. Also have a look |
|
[here](../nnunet/dataset_conversion/Task089_Fluo-N2DH-SIM.py). |
|
|
|
## How to update an existing dataset |
|
When updating a dataset you not only need to change the data located in `nnUNet_raw_data_base/nnUNet_raw_data`. Make |
|
sure to also delete the whole (!) corresponding dataset in `nnUNet_raw_data_base/nnUNet_cropped_data`. nnU-Net will not |
|
repeat the cropping (and thus will not update your dataset) if the old files are still in nnUNet_cropped_data! |
|
|
|
The best way of updating an existing dataset is (**choose one**): |
|
- delete all data and models belonging to the old version of the dataset (nnUNet_preprocessed, corresponding results |
|
in RESULTS_FOLDER/nnUNet, nnUNet_cropped_data, nnUNet_raw_data), then update |
|
- (recommended) create the updated dataset from scratch using a new task ID **and** name |
|
|
|
|
|
## How to convert other image formats to nifti |
|
Please have a look at the following tasks: |
|
- [Task120](../nnunet/dataset_conversion/Task120_Massachusetts_RoadSegm.py): 2D png images |
|
- [Task075](../nnunet/dataset_conversion/Task075_Fluo_C3DH_A549_ManAndSim.py) and [Task076](../nnunet/dataset_conversion/Task076_Fluo_N3DH_SIM.py): 3D tiff |
|
- [Task089](../nnunet/dataset_conversion/Task089_Fluo-N2DH-SIM.py) 2D tiff |