mAP drop

#3
by mhyatt000 - opened

I tried to reproduce the results mentioned on this model card. The received mAP does not match the claimed mAP in the model card.

  • Claimed mAP: 42.0
  • Recieved mAP: 39.7

Here are the details for my validation:

  • I instantiate pre-trained model with transformers.pipeline() and use COCO API to calculate AP from detection bboxes.
  • Evaluation was performed on macOS CPU.
  • Dataset was downloaded from cocodataset.org

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.590
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.420
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.185
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.431
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.316
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.470
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.483
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.238
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.525
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.691

Hi,

Thanks for validating. When porting the model to HuggingFace format, I made sure the logits and pred_boxes match exactly on the same input data as seen here.

Additionally, the image transformations used during validation can be found here. Images are 1) resized with a minimum size of 800 and a maximum size of 1333. The pipeline is using DetrFeatureExtractor behind the scenes to prepare images + targets for the model, and this one performs the same transformations as seen here.

Did you evaluate on COCO 2017?

Thanks for your help. Yes, I evaluated on COCO 2017.

Why does the validated model use a different preprocessing transform than the one provided in the DetrFeatureExtractor? Does this explain all the discrepancies between my 39.7 mAP and the reported 42.0 mAP?

Could you clarify the difference? They should be equivalent.

We can test this by preparing an image using DetrImageProcessor (previously called feature extractor) and the original pipeline, like so (after pip installing transformers and git cloning the original DETR repo):

from transformers import DetrImageProcessor
import requests
from PIL import Image
import torch

from datasets.coco import make_coco_transforms

# load image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

# prepare using image processor
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
pixel_values = processor(image, return_tensors="pt").pixel_values

# prepare using original code
original_transforms = make_coco_transforms("val")
original_pixel_values = original_transforms(image, None)[0].unsqueeze(0)

assert torch.allclose(pixel_values, original_pixel_values, atol=1e-4)

This passes locally for me.

I'd recommend taking a look at this notebook to evaluate the performance: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Evaluating_DETR_on_COCO_validation_2017.ipynb.

I wouldn't use the pipeline to evaluate the model, as that one uses a default threshold.

This comment has been hidden

@mhyatt000 we have reproduced the DETR results on our open detection leaderboard: https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard.

nielsr changed discussion status to closed

Sign up or log in to comment