Tesseract vs. GOT-OCR2_0: Which Performs Better for Text Extraction from Images?

#17
by bubbleMilkTea - opened

I'm curious about the differences between Tesseract and GOT-OCR2_0. Which one performs better?
My main goal is to convert an image file into general Markdown format. Do you recommend using GOT-OCR2_0's plain text OCR to extract text and then applying markdownify, or using GOT-OCR2_0's formatted text OCR to extract Mathpix Markdown and convert it to general Markdown? Which approach would be more efficient, and which would you recommend?

I also had this question, and it was my goal to find and receive mathematical expressions from a photo. In my experience, GOT-OCR2_0 is far better at handling problems than Tesseract.

Sign up or log in to comment