metadata

title: 🙋🏻‍♂️Welcome to Tonic's🫴🏻📸GOT-OCR

GOT-OCR Model Overview

The GOT-OCR model is a cutting-edge OCR system with 580M parameters, designed to process a wide range of "characters." Equipped with a high-compression encoder and a long-context decoder, it excels in both scene and document-style images. The model supports multi-page and dynamic resolution OCR, enhancing its versatility.

Output Formats

The model can generate results in several formats:

Plain Text
Markdown
TikZ diagrams
Molecular SMILES strings

Additionally, interactive OCR enables users to define regions of interest via coordinates or colors.

Key Features

Plain Text OCR: Extracts text from images.
Formatted Text OCR: Retains the original formatting, including tables and formulas.
Fine-grained OCR: Offers box-based and color-based OCR for precision in specific regions.
Multi-crop OCR: Handles multiple cropped sections within an image.
Rendered Formatted OCR: Outputs in markdown, TikZ, SMILES, and more, with rendered formatting.

Supported Content Types

Plain text
Math/molecular formulas
Tables and charts
Sheet music
Geometric shapes

How to Use

Select a task from the dropdown menu.
Upload an image.
(Optional) Adjust parameters based on the selected task.
Click Process to view the results.

Model Information

Model Name: GOT-OCR 2.0
Hugging Face Repository: ucaslcl/GOT-OCR2_0
Environment: CUDA 11.8 + PyTorch 2.0.1

Join us :

🌟TeamTonic🌟 is always making cool demos! Join our active builder's 🛠️community 👻 On 🤗Huggingface:MultiTransformer On 🌐Github: Tonic-AI & contribute to🌟 Build Tonic🤗Big thanks to Yuvi Sharma and all the folks at huggingface for the community grant 🤗