Nemotron-OCR-v2 performance on realistic synthetic bank statements + polygon request

#7
by AlroWilde - opened

Hi NVIDIA team,

Thanks for releasing Nemotron-OCR-v2.

I'm the author of synthetic-engine, a tool that generates highly realistic business documents (bank statements, invoices, etc.) with annotations for OCR/VLM training and evaluation.

I tested the model on synthetic bank statements produced by my engine. The performance is noticeably weaker than expected, especially on images with perspective distortion and complex layouts.

Example

Suggestion:
It would be very helpful if the model could support polygon bounding boxes (in addition to rectangles) for better handling of distortion/perspective text regions.

Happy to share more synthetic samples and ground truth if useful.

Best regards,
Alro Wilde

Sign up or log in to comment