deeplab2 / g3doc /faq.md
akhaliq3
spaces demo
506da10
# FAQ
________________________________________________________________________________
**Q1: What should I do if I encounter OOM (out-of-memory) while training the
models?**
**A1**: To avoid OOM, you could try:
1. reducing the training crop size (i.e., the flag `crop_size` in
`train_dataset_options`, and see Q2 for more details), which reduces the
input size during training,
2. using a larger output stride (e.g., 32) in the backbone (i.e., the flag
`output_stride` in `model_options`, and see Q3 for more details), which
reduces the usage of atrous convolution,
3. using a smaller backbone, such as ResNet-50.
________________________________________________________________________________
**Q2: What is the `crop_size` I need to set?**
**A2**: DeepLab framework always uses `crop_size` equal to `output_stride` * k +
1, where k is an integer.
* During inference/evaluation, since DeepLab framework uses whole-image
inference, we need to set k so that the resulting `crop_size` (in
`eval_dataset_options`) is slightly larger the largest image dimension in
the dataset. For example, we set eval_crop_size = 1025x2049 for Cityscapes
images whose image dimension is all equal to 1024x2048.
* During training, we could set k to be any integer as long as it fits to your
device memory. However, we notice a better performance when we have the same
`crop_size` during training and evaluation (i.e., also use whole-image crop
size during training).
________________________________________________________________________________
**Q3: What output stride should I use in the backbone?**
**A3**: Using a different output stride leads to a different accuracy-and-memory
trade-off. For example, DeepLabv1 uses output stride = 8, but it requires a lot
of device memory. In DeepLabv3+ paper, we found that using output stride = 16
strikes the best accuracy-and-memory trade-off, which is then our default
setting. If you wish to further reduce the memory usage, you could set output
stride to 32. Additionally, we suggest adjusting the `atrous_rates` in the ASPP
module as follows.
* If `backbone.output_stride` = 32, use `atrous_rates` = [3, 6, 9].
* If `backbone.output_stride` = 16, use `atrous_rates` = [6, 12, 18].
* If `backbone.output_stride` = 8, use `atrous_rates` = [12, 24, 36].
Note that these settings may not be optimal. You may need to adjust them to
better fit your dataset.
________________________________________________________________________________
**Q4: Why are the results reported by the provided evaluation code slightly
different from the official evaluation code (e.g.,
[Cityscapes](https://github.com/mcordts/cityscapesScripts))?**
**A4**: In order to run everything end-to-end in the TensorFlow system (e.g.,
the on-line evaluation during training), we re-implemented the evaluation codes
in TensorFlow. Additionally, our whole system, including the training and
evaluation pipelines, uses the panoptic label format (i.e., `panoptic_label =
semantic_label * label_divisor + instance_id`, where the `label_divisor` should
be larger than the maximum number of instances per image), instead of the JSON
[COCO formats](https://cocodataset.org/#format-data). These two changes along
with rounding and similar issues result in some minor differences. Therefore,
our re-implemented evaluation code is mainly used for TensorFlow integration
(e.g., the support of on-line evaluation in TensorBoard). The users should run
the corresponding official evaluation code in order to compare with other
published papers. Note that all the reported numbers in our papers are evaluated
with the official evaluation code.
To facilitate the conversion between prediction formats, we also provide
instructions for running the official evaluation codes on
[Cityscapes](setup/cityscapes_test_server_evaluation.md) and
[COCO](setup/coco_test_server_evaluation.md).
________________________________________________________________________________
**Q5: What should I do, if I could not manage to compile TensorFlow along with
the provided efficient merging operation `merge_semantic_and_instance_maps`?**
**A5**: In this case, we provide another fallback solution, which implements the
merging operation with pure tf functions. This fallback solution does not
require any TensorFlow compilation. However, note that compared to our provided
TensorFlow merging operation `merge_semantic_and_instance_maps`, its inference
speed is slower and the resulting segmentation performance may also be slightly
lower.
To use the pure-tf-function version of `merge_semantic_and_instance_maps`, set
`merge_semantic_instance_with_tf_op` to `false` in your config's
`evaluator_options`.