Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
hakunamatata1997 
posted an update 29 days ago
Post
998
Can someone suggest me a good open source vision model which performs good at OCR?

Interested too, there's no OCR leaderboard?

cc @merve likely nows

@HakunaMatata1997 hello!
I think on top of my head I can't think of an OCR model specifically, I was mostly using easyocr. OCR is a problem that is pretty much solved, so most of the AI work around docs are focused on understanding documents (because it's more than image -> text, it involves text, charts, tables, whole layout and more)
if you really want OCR there are models like https://huggingface.co/facebook/nougat-base that is for PDF to markdown for instance.
I can also recommend some for document understanding in general (which works on text + chart + image + layout) zero shot or as a backbone to finetune.

·

@merve more particularly if i say, something like understanding text good enough in images so the response are accurate enough from VLM

You can check this version as well https://huggingface.co/spaces/mindee/doctr

If you both need the model to be able to do some difficult reasoning about the information on the image, and you want the text on the image to be output as is:
QwenVL-Base, MiniCPM-Llama3-V-2_5, Fuyu-8B

And here are some good OCR-related leaderboards, on which you can also find a lot of very strong models.
For example, OCRBench converts a lot of proprietary OCR-era review(2 stages) into an end-to-end model format.
I recently came across one called reka-vibe-eval, which asks many questions about rich documents.

·

If you want to model only do ocr, I think you can try the paddle series