Pix2Text-Demo / README.md
breezedeus's picture
p2t v1.0
d917a85
metadata
title: Pix2Text
emoji: ♾️
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 4.19.2
app_file: app.py
pinned: false
license: mit

Pix2Text (P2T)

Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix. It can already complete the core functionalities of Mathpix. Starting from V0.2, Pix2Text (P2T) supports recognizing mixed images containing both text and formulas, with output similar to Mathpix. The core principles of P2T are shown below (text recognition supports both Chinese and English):

Pix2Text workflow

P2T utilizes the open-source tool CnSTD to detect the locations of mathematical formulas in images. These detected areas are then processed by P2T's own formula recognition engine (LatexOCR) to recognize the LaTeX representation of each mathematical formula. The remaining parts of the image are processed by a text recognition engine (CnOCR or EasyOCR) for text detection and recognition. Finally, P2T merges all recognition results to obtain the final image recognition outcome. Thanks to these great open-source projects!

For beginners who are not familiar with Python, we also provide the free-to-use P2T Online Service. Just upload your image and it will output the P2T parsing results. The online service uses the latest models and works better than the open-source ones.

The author also maintains Planet of Knowledge P2T/CnOCR/CnSTD Private Group, welcome to join. The Planet of Knowledge Private Group will release some P2T/CnOCR/CnSTD related private materials one after another, including non-public models, discount for paid models, answers to problems encountered during usage, etc. This group also releases the latest research materials related to VIE/OCR/STD.