YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Nanonets OCR2-3B (Fixed for HF Endpoints)
This is a fixed version of nanonets/Nanonets-OCR2-3B that resolves deployment issues on Hugging Face Inference Endpoints.
What's Fixed
The original model had a compatibility issue with HF Endpoints' default transformers==4.48.0:
- Error:
AttributeError: 'dict' object has no attribute 'to_dict' - Cause:
text_configwas loaded as a dict instead ofPretrainedConfig - Solution: Updated to
transformers>=4.55.4viarequirements.txt
Using on HF Inference Endpoints
Option 1: Direct Deployment (Recommended)
- Go to Hugging Face Inference Endpoints
- Click "New endpoint"
- Select this model:
nomadarun/Nanonets-OCR2-3B-fixed - Choose GPU instance (recommended: 1x A10G or higher)
- Deploy!
The requirements.txt in this repo will automatically install the correct dependencies.
Option 2: Custom Handler (Advanced)
If you need custom preprocessing, create a handler.py:
from transformers import AutoModelForVision2Seq, AutoProcessor
import torch
class EndpointHandler:
def __init__(self, path=""):
self.model = AutoModelForVision2Seq.from_pretrained(
path,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
).eval()
self.processor = AutoProcessor.from_pretrained(
path,
trust_remote_code=True
)
def __call__(self, data):
inputs = data.pop("inputs", data)
# Your inference logic here
return {"result": "..."}
Local Usage
from transformers import AutoModelForVision2Seq, AutoProcessor
import torch
model = AutoModelForVision2Seq.from_pretrained(
"nomadarun/Nanonets-OCR2-3B-fixed",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(
"nomadarun/Nanonets-OCR2-3B-fixed",
trust_remote_code=True
)
# Your OCR inference code here
Requirements
transformers>=4.55.4accelerate>=0.27.2torch>=2.0.0pillow>=10.0.0qwen-vl-utils
All dependencies are automatically installed from requirements.txt on HF Endpoints.
Model Details
- Base Architecture: Qwen2.5-VL
- Parameters: 3 Billion
- Task: Optical Character Recognition (OCR)
- Original Model: nanonets/Nanonets-OCR2-3B
Credits
Original model by Nanonets. This fork only adds compatibility fixes for HF Endpoints deployment.
License
Same as original model license.
- Downloads last month
- 6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support