# FAQ ________________________________________________________________________________ **Q1: What should I do if I encounter OOM (out-of-memory) while training the models?** **A1**: To avoid OOM, you could try: 1. reducing the training crop size (i.e., the flag `crop_size` in `train_dataset_options`, and see Q2 for more details), which reduces the input size during training, 2. using a larger output stride (e.g., 32) in the backbone (i.e., the flag `output_stride` in `model_options`, and see Q3 for more details), which reduces the usage of atrous convolution, 3. using a smaller backbone, such as ResNet-50. ________________________________________________________________________________ **Q2: What is the `crop_size` I need to set?** **A2**: DeepLab framework always uses `crop_size` equal to `output_stride` * k + 1, where k is an integer. * During inference/evaluation, since DeepLab framework uses whole-image inference, we need to set k so that the resulting `crop_size` (in `eval_dataset_options`) is slightly larger the largest image dimension in the dataset. For example, we set eval_crop_size = 1025x2049 for Cityscapes images whose image dimension is all equal to 1024x2048. * During training, we could set k to be any integer as long as it fits to your device memory. However, we notice a better performance when we have the same `crop_size` during training and evaluation (i.e., also use whole-image crop size during training). ________________________________________________________________________________ **Q3: What output stride should I use in the backbone?** **A3**: Using a different output stride leads to a different accuracy-and-memory trade-off. For example, DeepLabv1 uses output stride = 8, but it requires a lot of device memory. In DeepLabv3+ paper, we found that using output stride = 16 strikes the best accuracy-and-memory trade-off, which is then our default setting. If you wish to further reduce the memory usage, you could set output stride to 32. Additionally, we suggest adjusting the `atrous_rates` in the ASPP module as follows. * If `backbone.output_stride` = 32, use `atrous_rates` = [3, 6, 9]. * If `backbone.output_stride` = 16, use `atrous_rates` = [6, 12, 18]. * If `backbone.output_stride` = 8, use `atrous_rates` = [12, 24, 36]. Note that these settings may not be optimal. You may need to adjust them to better fit your dataset. ________________________________________________________________________________ **Q4: Why are the results reported by the provided evaluation code slightly different from the official evaluation code (e.g., [Cityscapes](https://github.com/mcordts/cityscapesScripts))?** **A4**: In order to run everything end-to-end in the TensorFlow system (e.g., the on-line evaluation during training), we re-implemented the evaluation codes in TensorFlow. Additionally, our whole system, including the training and evaluation pipelines, uses the panoptic label format (i.e., `panoptic_label = semantic_label * label_divisor + instance_id`, where the `label_divisor` should be larger than the maximum number of instances per image), instead of the JSON [COCO formats](https://cocodataset.org/#format-data). These two changes along with rounding and similar issues result in some minor differences. Therefore, our re-implemented evaluation code is mainly used for TensorFlow integration (e.g., the support of on-line evaluation in TensorBoard). The users should run the corresponding official evaluation code in order to compare with other published papers. Note that all the reported numbers in our papers are evaluated with the official evaluation code. To facilitate the conversion between prediction formats, we also provide instructions for running the official evaluation codes on [Cityscapes](setup/cityscapes_test_server_evaluation.md) and [COCO](setup/coco_test_server_evaluation.md). ________________________________________________________________________________ **Q5: What should I do, if I could not manage to compile TensorFlow along with the provided efficient merging operation `merge_semantic_and_instance_maps`?** **A5**: In this case, we provide another fallback solution, which implements the merging operation with pure tf functions. This fallback solution does not require any TensorFlow compilation. However, note that compared to our provided TensorFlow merging operation `merge_semantic_and_instance_maps`, its inference speed is slower and the resulting segmentation performance may also be slightly lower. To use the pure-tf-function version of `merge_semantic_and_instance_maps`, set `merge_semantic_instance_with_tf_op` to `false` in your config's `evaluator_options`.