Spaces:
Running
on
Zero
Running
on
Zero
metadata
title: 🙋🏻♂️Welcome to Tonic's🫴🏻📸GOT-OCR
GOT-OCR Model Overview
The GOT-OCR model is a cutting-edge OCR system with 580M parameters, designed to process a wide range of "characters." Equipped with a high-compression encoder and a long-context decoder, it excels in both scene and document-style images. The model supports multi-page and dynamic resolution OCR, enhancing its versatility.
Output Formats
The model can generate results in several formats:
- Plain Text
- Markdown
- TikZ diagrams
- Molecular SMILES strings
Additionally, interactive OCR enables users to define regions of interest via coordinates or colors.
Key Features
- Plain Text OCR: Extracts text from images.
- Formatted Text OCR: Retains the original formatting, including tables and formulas.
- Fine-grained OCR: Offers box-based and color-based OCR for precision in specific regions.
- Multi-crop OCR: Handles multiple cropped sections within an image.
- Rendered Formatted OCR: Outputs in markdown, TikZ, SMILES, and more, with rendered formatting.
Supported Content Types
- Plain text
- Math/molecular formulas
- Tables and charts
- Sheet music
- Geometric shapes
How to Use
- Select a task from the dropdown menu.
- Upload an image.
- (Optional) Adjust parameters based on the selected task.
- Click Process to view the results.
Model Information
- Model Name: GOT-OCR 2.0
- Hugging Face Repository: ucaslcl/GOT-OCR2_0
- Environment: CUDA 11.8 + PyTorch 2.0.1
Join us :
🌟TeamTonic🌟 is always making cool demos! Join our active builder's 🛠️community 👻 On 🤗Huggingface:MultiTransformer On 🌐Github: Tonic-AI & contribute to🌟 Build Tonic🤗Big thanks to Yuvi Sharma and all the folks at huggingface for the community grant 🤗