Nemotron-OCR-v2 performance on realistic synthetic bank statements + polygon request
#7
by AlroWilde - opened
Hi NVIDIA team,
Thanks for releasing Nemotron-OCR-v2.
I'm the author of synthetic-engine, a tool that generates highly realistic business documents (bank statements, invoices, etc.) with annotations for OCR/VLM training and evaluation.
I tested the model on synthetic bank statements produced by my engine. The performance is noticeably weaker than expected, especially on images with perspective distortion and complex layouts.
Suggestion:
It would be very helpful if the model could support polygon bounding boxes (in addition to rectangles) for better handling of distortion/perspective text regions.
Happy to share more synthetic samples and ground truth if useful.
Best regards,
Alro Wilde
