GOT-OCR / content /index.md
Jordan Legg
refactor: make title and description easier to use
b153fc4
|
raw
history blame
2.2 kB
metadata
title: 🙋🏻‍♂️Welcome to Tonic's🫴🏻📸GOT-OCR

GOT-OCR Model Overview

The GOT-OCR model is a cutting-edge OCR system with 580M parameters, designed to process a wide range of "characters." Equipped with a high-compression encoder and a long-context decoder, it excels in both scene and document-style images. The model supports multi-page and dynamic resolution OCR, enhancing its versatility.

Output Formats

The model can generate results in several formats:

  • Plain Text
  • Markdown
  • TikZ diagrams
  • Molecular SMILES strings

Additionally, interactive OCR enables users to define regions of interest via coordinates or colors.

Key Features

  • Plain Text OCR: Extracts text from images.
  • Formatted Text OCR: Retains the original formatting, including tables and formulas.
  • Fine-grained OCR: Offers box-based and color-based OCR for precision in specific regions.
  • Multi-crop OCR: Handles multiple cropped sections within an image.
  • Rendered Formatted OCR: Outputs in markdown, TikZ, SMILES, and more, with rendered formatting.

Supported Content Types

  • Plain text
  • Math/molecular formulas
  • Tables and charts
  • Sheet music
  • Geometric shapes

How to Use

  1. Select a task from the dropdown menu.
  2. Upload an image.
  3. (Optional) Adjust parameters based on the selected task.
  4. Click Process to view the results.

Model Information

  • Model Name: GOT-OCR 2.0
  • Hugging Face Repository: ucaslcl/GOT-OCR2_0
  • Environment: CUDA 11.8 + PyTorch 2.0.1

Join us :

🌟TeamTonic🌟 is always making cool demos! Join our active builder's 🛠️community 👻 Join us on Discord On 🤗Huggingface:MultiTransformer On 🌐Github: Tonic-AI & contribute to🌟 Build Tonic🤗Big thanks to Yuvi Sharma and all the folks at huggingface for the community grant 🤗