METRICS_TAB_TEXT = ''' 🎯 Average Precision and Average Recall are popular metrics for evaluating the accuracy of object detectors by estimating the Precision-Recall relationship. Here you will find details about the object detection metrics reported in our leaderboard. # Metrics There are plenty of variations of these metrics, depending on the IoU threshold, the area of the object, and the number of detections per image. The most popular ones are: ## Average Precision (AP) - **AP**: AP at IoU=.50:.05:.95 - **AP@.50 (APIoU=.50)**: AP at IoU=.50 (similar to mAP PASCAL VOC metric) - **AP@.75 (APIoU=.75)**: AP at IoU=.75 (strict metric) ## Average Precision Across Scales - **AP-S (APsmall)**: AP for small objects: area < 322. - **AP-M (APmedium)**: AP for medium objects: 322 < area < 962. - **AP-L (APlarge)**: AP for large objects: area > 962. ## Average Recall (AR) - **AR1 (ARmax=1)**: AR given 1 detection per image. - **AR10 (ARmax=10)**: AR given 10 detections per image. - **AR100 (ARmax=100)**: AR given 100 detections per image. ## Average Recall Across Scales - **AR-S (ARsmall)**: AR for small objects: area < 322. - **AR-M (ARmedium)**: AR for medium objects: 322 < area < 962. - **AR-L (ARlarge)**: AR for large objects: area > 962. ## How to reproduce our results To compute these metrics, various tools employ different methods. For this leaderboard's evaluation, we utilize the COCO evaluation approach, which can be found in the [COCO evaluation toolkit]( The 🤗 `Evaluate` metric used to measure the results is also accessible in the hub: [detection_metrics]( It is essential to note that slight differences in results may arise between the results presented here and those from other sources. These differences can be attributed to numerical approximations, variations in batch sizes, and other hyperparameters. To ensure a consistent evaluation, we recommend using a batch size of 1 and setting the confidence threshold to 0 when evaluating your model. ## Benchmark datasets We understand that the object detection task can be quite diverse and the requirements can vary greatly across different domains such as autonomous driving, medical imaging, aerial imaging, etc. With this in mind, we are interested in knowing if there are specific domains or applications that you believe should be addressed by our benchmarks. Please, [join our discussion]( and give your suggestion. ### COCO dataset The Microsoft Common Objects in Context (COCO) dataset is a highly regarded benchmark for object detection models due to its comprehensive set of 80 object categories, extensive volume of images with complex scenes, and high-quality, manually annotated labels. Moreover, its versatility in supporting multiple computer vision tasks, along with a standardized format and an active community, makes it a robust, challenging, and easily comparable benchmark. The benchmarking COCO validation 2017 dataset is available in the 🤗 hub: [coco2017]( ## 📚 Useful Readings For further insight into the subject, you may find the following readings helpful: - [A Survey on Performance Metrics for Object-Detection Algorithms](, R Padilla, SL Netto, EAB Da Silva - *IWSSIP, 2020* - [A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit](, R Padilla, WL Passos, TLB Dias, SL Netto, EAB Da Silva - *Journal Electronics, 2021* '''