# Running DeepLab on ADE20K Semantic Segmentation Dataset This page walks through the steps required to run DeepLab on ADE20K dataset on a local machine. ## Download dataset and convert to TFRecord We have prepared the script (under the folder `datasets`) to download and convert ADE20K semantic segmentation dataset to TFRecord. ```bash # From the tensorflow/models/research/deeplab/datasets directory. bash download_and_convert_ade20k.sh ``` The converted dataset will be saved at ./deeplab/datasets/ADE20K/tfrecord ## Recommended Directory Structure for Training and Evaluation ``` + datasets - build_data.py - build_ade20k_data.py - download_and_convert_ade20k.sh + ADE20K + tfrecord + exp + train_on_train_set + train + eval + vis + ADEChallengeData2016 + annotations + training + validation + images + training + validation ``` where the folder `train_on_train_set` stores the train/eval/vis events and results (when training DeepLab on the ADE20K train set). ## Running the train/eval/vis jobs A local training job using `xception_65` can be run with the following command: ```bash # From tensorflow/models/research/ python deeplab/train.py \ --logtostderr \ --training_number_of_steps=150000 \ --train_split="train" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --train_crop_size="513,513" \ --train_batch_size=4 \ --min_resize_value=513 \ --max_resize_value=513 \ --resize_factor=16 \ --dataset="ade20k" \ --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \ --train_logdir=${PATH_TO_TRAIN_DIR}\ --dataset_dir=${PATH_TO_DATASET} ``` where ${PATH\_TO\_INITIAL\_CHECKPOINT} is the path to the initial checkpoint. ${PATH\_TO\_TRAIN\_DIR} is the directory in which training checkpoints and events will be written to (it is recommended to set it to the `train_on_train_set/train` above), and ${PATH\_TO\_DATASET} is the directory in which the ADE20K dataset resides (the `tfrecord` above) **Note that for train.py:** 1. In order to fine tune the BN layers, one needs to use large batch size (> 12), and set fine_tune_batch_norm = True. Here, we simply use small batch size during training for the purpose of demonstration. If the users have limited GPU memory at hand, please fine-tune from our provided checkpoints whose batch norm parameters have been trained, and use smaller learning rate with fine_tune_batch_norm = False. 2. User should fine tune the `min_resize_value` and `max_resize_value` to get better result. Note that `resize_factor` has to be equal to `output_stride`. 3. The users should change atrous_rates from [6, 12, 18] to [12, 24, 36] if setting output_stride=8. 4. The users could skip the flag, `decoder_output_stride`, if you do not want to use the decoder structure. ## Running Tensorboard Progress for training and evaluation jobs can be inspected using Tensorboard. If using the recommended directory structure, Tensorboard can be run using the following command: ```bash tensorboard --logdir=${PATH_TO_LOG_DIRECTORY} ``` where `${PATH_TO_LOG_DIRECTORY}` points to the directory that contains the train directorie (e.g., the folder `train_on_train_set` in the above example). Please note it may take Tensorboard a couple minutes to populate with data.