YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Nanonets OCR2-3B (Fixed for HF Endpoints)

This is a fixed version of nanonets/Nanonets-OCR2-3B that resolves deployment issues on Hugging Face Inference Endpoints.

What's Fixed

The original model had a compatibility issue with HF Endpoints' default transformers==4.48.0:

  • Error: AttributeError: 'dict' object has no attribute 'to_dict'
  • Cause: text_config was loaded as a dict instead of PretrainedConfig
  • Solution: Updated to transformers>=4.55.4 via requirements.txt

Using on HF Inference Endpoints

Option 1: Direct Deployment (Recommended)

  1. Go to Hugging Face Inference Endpoints
  2. Click "New endpoint"
  3. Select this model: nomadarun/Nanonets-OCR2-3B-fixed
  4. Choose GPU instance (recommended: 1x A10G or higher)
  5. Deploy!

The requirements.txt in this repo will automatically install the correct dependencies.

Option 2: Custom Handler (Advanced)

If you need custom preprocessing, create a handler.py:

from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

class EndpointHandler:
    def __init__(self, path=""):
        self.model = AutoModelForVision2Seq.from_pretrained(
            path,
            torch_dtype=torch.bfloat16,
            device_map="auto",
            trust_remote_code=True,
        ).eval()

        self.processor = AutoProcessor.from_pretrained(
            path,
            trust_remote_code=True
        )

    def __call__(self, data):
        inputs = data.pop("inputs", data)
        # Your inference logic here
        return {"result": "..."}

Local Usage

from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

model = AutoModelForVision2Seq.from_pretrained(
    "nomadarun/Nanonets-OCR2-3B-fixed",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

processor = AutoProcessor.from_pretrained(
    "nomadarun/Nanonets-OCR2-3B-fixed",
    trust_remote_code=True
)

# Your OCR inference code here

Requirements

  • transformers>=4.55.4
  • accelerate>=0.27.2
  • torch>=2.0.0
  • pillow>=10.0.0
  • qwen-vl-utils

All dependencies are automatically installed from requirements.txt on HF Endpoints.

Model Details

  • Base Architecture: Qwen2.5-VL
  • Parameters: 3 Billion
  • Task: Optical Character Recognition (OCR)
  • Original Model: nanonets/Nanonets-OCR2-3B

Credits

Original model by Nanonets. This fork only adds compatibility fixes for HF Endpoints deployment.

License

Same as original model license.

Downloads last month
6
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for axionlab/Nanonets-OCR2-3B-fixed

Quantizations
2 models