rafaelpadilla's picture
include FPS and license columns + text with FPS and hardware info
a5c4771
from pathlib import Path
# Directory where request by models are stored
DIR_OUTPUT_REQUESTS = Path("requested_models")
EVAL_REQUESTS_PATH = Path("eval_requests")
##########################
# Text definitions #
##########################
banner_url = "https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard/resolve/main/assets/leaderboard_small.png"
BANNER = f'<div style="display: flex; justify-content: space-around;"><img src="{banner_url}" alt="Banner" style="width: 40vw; min-width: 300px; max-width: 600px;"> </div>'
TITLE = "<html> <head> <style> h1 {text-align: center;} </style> </head> <body> <h1> πŸ€— Open Object Detection Leaderboard </b> </body> </html>"
INTRODUCTION_TEXT = "πŸ“ The πŸ€— Open Object Detection Leaderboard aims to track, rank and evaluate vision models \
available in the hub designed to detect objects in images. \
Anyone from the community can request a model to be evaluated and added to the leaderboard. \
\nCheck the πŸ“ˆ Metrics tab to understand how the models are evaluated. \
\nIf you want results for a model that is not listed here, you can βœ‰οΈβœ¨ request results for it."
CITATION_TEXT = '''@misc{open-od-leaderboard,
author = {Rafael Padilla, Amy Roberts and the Hugging Face Team},
title = {Open Object Detection Leaderboard},
year = {2023},
publisher = {Hugging Face},
howpublished = "\\url{https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard}"
}
'''
METRICS_TAB_TEXT = '''
🎯 Average Precision and Average Recall are popular metrics for evaluating the accuracy of object detectors by estimating the Precision-Recall relationship.
Here you will find details about the object detection metrics reported in our leaderboard.
# Metrics
There are plenty of variations of these metrics, depending on the IoU threshold, the area of the object, and the number of detections per image. The most popular ones are:
## Average Precision (AP)
- **AP**: AP at IoU=.50:.05:.95
- **AP@.50 (APIoU=.50)**: AP at IoU=.50 (similar to mAP PASCAL VOC metric)
- **AP@.75 (APIoU=.75)**: AP at IoU=.75 (strict metric)
## Average Precision Across Scales
- **AP-S (APsmall)**: AP for small objects: area < 322.
- **AP-M (APmedium)**: AP for medium objects: 322 < area < 962.
- **AP-L (APlarge)**: AP for large objects: area > 962.
## Average Recall (AR)
- **AR1 (ARmax=1)**: AR given 1 detection per image.
- **AR10 (ARmax=10)**: AR given 10 detections per image.
- **AR100 (ARmax=100)**: AR given 100 detections per image.
## Average Recall Across Scales
- **AR-S (ARsmall)**: AR for small objects: area < 322.
- **AR-M (ARmedium)**: AR for medium objects: 322 < area < 962.
- **AR-L (ARlarge)**: AR for large objects: area > 962.
## Frames Per Second (FPS)
We measure the frames per second (FPS) for each model by looking at the average time it takes across the whole dataset. This includes the pre and post processing steps.
The hardware we use definitely plays a role in these numbers. You can see which hardware we used in the results table. πŸ“ˆ
Because each model has its own specific needs when it comes to batch size and memory requirements, we decided to test them with just 1 image per batch. One thing to keep in mind: this test setup might not fully reflect real-world scenarios. Typically, more images are processed together to get things moving faster. πŸš€"
## How to reproduce our results
To compute these metrics, various tools employ different methods. For this leaderboard's evaluation, we utilize the COCO evaluation approach, which can be found in the [COCO evaluation toolkit](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py).
The πŸ€— `Evaluate` metric used to measure the results is also accessible in the hub: [detection_metrics](https://huggingface.co/spaces/rafaelpadilla/detection_metrics)
It is essential to note that slight differences in results may arise between the results presented here and those from other sources. These differences can be attributed to numerical approximations, variations in batch sizes, and other hyperparameters. To ensure a consistent evaluation, we recommend using a batch size of 1 and setting the confidence threshold to 0 when evaluating your model.
## Benchmark datasets
We understand that the object detection task can be quite diverse and the requirements can vary greatly across different domains such as autonomous driving, medical imaging, aerial imaging, etc.
With this in mind, we are interested in knowing if there are specific domains or applications that you believe should be addressed by our benchmarks. Please, [join our discussion](https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard/discussions/1) and give your suggestion.
### COCO dataset
The Microsoft Common Objects in Context (COCO) dataset is a highly regarded benchmark for object detection models due to its comprehensive set of 80 object categories, extensive volume of images with complex scenes, and high-quality, manually annotated labels.
Moreover, its versatility in supporting multiple computer vision tasks, along with a standardized format and an active community, makes it a robust, challenging, and easily comparable benchmark.
The benchmarking COCO validation 2017 dataset is available in the πŸ€— hub: [coco2017](https://huggingface.co/datasets/rafaelpadilla/coco2017)
## πŸ“š Useful Readings
For further insight into the subject, you may find the following readings helpful:
- [A Survey on Performance Metrics for Object-Detection Algorithms](https://www.researchgate.net/profile/Rafael-Padilla/publication/343194514_A_Survey_on_Performance_Metrics_for_Object-Detection_Algorithms/links/5f1b5a5e45851515ef478268/A-Survey-on-Performance-Metrics-for-Object-Detection-Algorithms.pdf), R Padilla, SL Netto, EAB Da Silva - *IWSSIP, 2020*
- [A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit](https://www.mdpi.com/2079-9292/10/3/279/pdf), R Padilla, WL Passos, TLB Dias, SL Netto, EAB Da Silva - *Journal Electronics, 2021*
'''