Nanonets OCR2-3B (Fixed for HF Endpoints)

This is a fixed version of nanonets/Nanonets-OCR2-3B that resolves deployment issues on Hugging Face Inference Endpoints.

What's Fixed

The original model had a compatibility issue with HF Endpoints' default transformers==4.48.0:

Error: AttributeError: 'dict' object has no attribute 'to_dict'
Cause: text_config was loaded as a dict instead of PretrainedConfig
Solution: Updated to transformers>=4.55.4 via requirements.txt

Using on HF Inference Endpoints

Option 1: Direct Deployment (Recommended)

Go to Hugging Face Inference Endpoints
Click "New endpoint"
Select this model: nomadarun/Nanonets-OCR2-3B-fixed
Choose GPU instance (recommended: 1x A10G or higher)
Deploy!

The requirements.txt in this repo will automatically install the correct dependencies.

Option 2: Custom Handler (Advanced)

If you need custom preprocessing, create a handler.py:

from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

class EndpointHandler:
    def __init__(self, path=""):
        self.model = AutoModelForVision2Seq.from_pretrained(
            path,
            torch_dtype=torch.bfloat16,
            device_map="auto",
            trust_remote_code=True,
        ).eval()

        self.processor = AutoProcessor.from_pretrained(
            path,
            trust_remote_code=True
        )

    def __call__(self, data):
        inputs = data.pop("inputs", data)
        # Your inference logic here
        return {"result": "..."}

Local Usage

from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

model = AutoModelForVision2Seq.from_pretrained(
    "nomadarun/Nanonets-OCR2-3B-fixed",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

processor = AutoProcessor.from_pretrained(
    "nomadarun/Nanonets-OCR2-3B-fixed",
    trust_remote_code=True
)

# Your OCR inference code here

Requirements

transformers>=4.55.4
accelerate>=0.27.2
torch>=2.0.0
pillow>=10.0.0
qwen-vl-utils

All dependencies are automatically installed from requirements.txt on HF Endpoints.

Model Details

Base Architecture: Qwen2.5-VL
Parameters: 3 Billion
Task: Optical Character Recognition (OCR)
Original Model: nanonets/Nanonets-OCR2-3B

Credits

Original model by Nanonets. This fork only adds compatibility fixes for HF Endpoints deployment.

License

Same as original model license.

Downloads last month: 6

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for axionlab/Nanonets-OCR2-3B-fixed

Quantizations

2 models