🖼️ Next OCR 8B

Compact OCR AI — Accurate, Fast, Multilingual, Math-Optimized

License: MIT Language: Multilingual HuggingFace


📖 Overview

Next OCR 8B is an 8-billion parameter model optimized for optical character recognition (OCR) tasks with mathematical and tabular content understanding.

Supports multilingual OCR (Turkish, English, German, Spanish, French, Chinese, Japanese, Korean, Russian...) with high accuracy, including structured documents like tables, forms, and formulas.


⚡ Highlights

  • 🖼️ Accurate text extraction, including math and tables
  • 🌍 Multilingual support (30+ languages)
  • ⚡ Lightweight and efficient
  • 💬 Instruction-tuned for document understanding and analysis

📊 Benchmark & Comparison

image


Model OCR-Bench Accuracy (%) Multilingual Accuracy (%) Layout / Table Understanding (%)
Next OCR 99.0 96.8 95.3
PaddleOCR 95.2 93.9 95.3
Deepseek OCR 90.6 87.4 86.1
Tesseract 92.0 88.4 72.0
EasyOCR 90.4 84.7 78.9
Google Cloud Vision / DocAI 98.7 95.5 93.6
Amazon Textract 94.7 86.2 86.1
Azure Document Intelligence 95.1 93.6 91.4

Model Handwriting (%) Scene Text (%) Complex Tables (%)
Next OCR 92 96 91
PaddleOCR 88 92 90
Deepseek OCR 80 85 83
Tesseract 75 88 70
EasyOCR 78 86 75
Google Cloud Vision / DocAI 90 95 92
Amazon Textract 85 90 88
Azure Document Intelligence 87 91 89

🚀 Installation & Usage

from transformers import AutoTokenizer, AutoModelForVision2Seq
import torch

model_id = "Lamapi/next-ocr"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForVision2Seq.from_pretrained(model_id, torch_dtype=torch.float16)

img = Image.open("image.jpg")

# ATTENTION: The content list must include both an image and text.
messages = [
    {"role": "system", "content": "You are Next-OCR, an helpful AI assistant trained by Lamapi."},
    {
        "role": "user",
        "content": [
            {"type": "image", "image": img},
            {"type": "text", "text": "Read the text in this image and summarize it."}
        ]
    }
]

# Apply the chat template correctly
prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)

with torch.no_grad():
    generated = model.generate(**inputs, max_new_tokens=256)

print(processor.decode(generated[0], skip_special_tokens=True))

🧩 Key Features

Feature Description
🖼️ High-Accuracy OCR Extracts text from images, documents, and screenshots reliably.
🇹🇷 Multilingual Support Works with 30+ languages including Turkish.
⚡ Lightweight & Efficient Optimized for resource-constrained environments.
📄 Layout & Math Awareness Handles tables, forms, and mathematical formulas.
🏢 Reliable Outputs Suitable for enterprise document workflows.

📐 Model Specifications

Specification Details
Base Model Qwen 3
Parameters 8 Billion
Architecture Vision + Transformer (OCR LLM)
Modalities Image-to-text
Fine-Tuning OCR datasets with multilingual and math/tabular content
Optimizations Quantization-ready, FP16 support
Primary Focus Text extraction, document understanding, mathematical OCR

🎯 Ideal Use Cases

  • Document digitization
  • Invoice & receipt processing
  • Multilingual OCR pipelines
  • Tables, forms, and formulas extraction
  • Enterprise document management

📄 License

MIT License — free for commercial & non-commercial use.


📞 Contact & Support


Next OCR — Compact OCR + math-capable AI, blending accuracy, speed, and multilingual document intelligence.

Follow on HuggingFace

Downloads last month
593
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lamapi/next-ocr

Quantizations
16 models

Collection including Lamapi/next-ocr