Image-Text-to-Text
PEFT
Safetensors
qwen2_5_vl
qwen2.5-vl
vision-language-model
invoice-extraction
document-understanding
ocr
indian-invoices
gst
lora
unsloth
fine-tuned
conversational
Instructions to use gouri100/Unsloth_Qwen-2.5_7B-Invoice-962 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use gouri100/Unsloth_Qwen-2.5_7B-Invoice-962 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use gouri100/Unsloth_Qwen-2.5_7B-Invoice-962 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for gouri100/Unsloth_Qwen-2.5_7B-Invoice-962 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for gouri100/Unsloth_Qwen-2.5_7B-Invoice-962 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for gouri100/Unsloth_Qwen-2.5_7B-Invoice-962 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="gouri100/Unsloth_Qwen-2.5_7B-Invoice-962", max_seq_length=2048, )
Qwen2.5-VL 7B — Indian Invoice Extraction
Fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct specialized for extracting structured JSON from Indian GST invoices (B2B, B2C, export, IRN/ACK, multi-layout). Trained with QLoRA + Unsloth on an NVIDIA A100 80 GB. Merged via PEFT merge_and_unload().
Available Versions
| Version | Link | Use case |
|---|---|---|
| Merged bfloat16 | gouri100/Unsloth_Qwen-2.5_7B-Invoice-962 | Full precision inference |
| GGUF Q4_K_M | gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF | llama.cpp / Ollama — light GPU |
| GGUF Q8_0 | gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF | llama.cpp / Ollama — higher quality |
Model Summary
| Property | Value |
|---|---|
| Base model | Qwen/Qwen2.5-VL-7B-Instruct |
| Fine-tuning method | QLoRA (r=64, alpha=128) |
| Merge method | PEFT merge_and_unload() — bfloat16 safetensors |
| Framework | Unsloth + TRL SFTTrainer |
| Hardware | NVIDIA A100 80 GB |
| Task | Invoice image to Structured JSON |
| Input types | JPG, PNG, PDF (page 1 at 200 DPI) |
| Languages | English, Hindi, Tamil, Malayalam, Telugu, Kannada, Bengali |
| License | Apache 2.0 |
Training Dataset
| Property | Value |
|---|---|
| Total samples | 962 |
| File types | JPG, PNG, PDF |
| PDF handling | Page 1 extracted at 200 DPI, resized to max 1280px |
| Invoice types | B2B GST, B2C, Export, IRN/ACK |
| Annotation | Manually labeled JSON per invoice |
Output JSON Schema
{
"metadata": {
"invoice_no": "string",
"invoice_date": "YYYY-MM-DD",
"irn": "string | null",
"ack_no": "string | null",
"ack_date": "string | null"
},
"supplier": {
"name": "string",
"gstin": "string",
"address": "string",
"state_code": "string"
},
"buyer": {
"name": "string",
"gstin": "string",
"address": "string",
"state_code": "string"
},
"line_items": [{
"sl_no": "number",
"description": "string",
"hsn_sac": "string",
"qty": "number",
"unit": "string",
"rate": "number",
"amount": "number"
}],
"tax": {
"taxable_value": "number",
"cgst_rate": "number",
"cgst_amount": "number",
"sgst_rate": "number",
"sgst_amount": "number",
"igst_rate": "number",
"igst_amount": "number",
"total_tax": "number",
"grand_total": "number",
"round_off": "number"
}
}
Training Configuration
| Hyperparameter | Value |
|---|---|
| Epochs | 3 |
| Learning rate | 0.0002 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Per device batch size | 2 |
| Gradient accumulation steps | 8 |
| Effective batch size | 16 |
| Max sequence length | 2048 |
| Precision | bfloat16 |
| LoRA rank (r) | 64 |
| LoRA alpha | 128 |
| LoRA dropout | 0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Vision layers fine-tuned | Yes |
| Gradient checkpointing | Unsloth optimized |
Training Results
| Metric | Value |
|---|---|
| Final training loss | 0.2594 |
| Total steps | N/A |
| Training time | 2243.16s (37.4 min) |
| Steps per second | 0.082 |
Inference
With transformers (merged model)
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch, json
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"gouri100/Unsloth_Qwen-2.5_7B-Invoice-962",
torch_dtype = torch.bfloat16,
device_map = 'auto',
)
processor = AutoProcessor.from_pretrained("gouri100/Unsloth_Qwen-2.5_7B-Invoice-962")
image = Image.open('invoice.jpg').convert('RGB')
SYSTEM_PROMPT = (
'You are an expert system for extracting structured data from invoices. '
'Return ONLY valid JSON. Do NOT include explanations or extra text.'
)
messages = [
{'role': 'system', 'content': [{'type': 'text', 'text': SYSTEM_PROMPT}]},
{'role': 'user', 'content': [
{'type': 'image', 'image': image},
{'type': 'text', 'text': 'Extract structured invoice data as JSON.'}
]}
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt = True,
tokenize = True,
return_tensors = 'pt',
return_dict = True,
).to(model.device)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens = 1024,
temperature = 0.1,
do_sample = False,
)
decoded = processor.decode(
output_ids[0][inputs['input_ids'].shape[1]:],
skip_special_tokens = True,
)
result = json.loads(decoded)
print(json.dumps(result, indent=2, ensure_ascii=False))
Load in 4-bit (lighter GPUs)
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit = True,
bnb_4bit_compute_dtype = torch.bfloat16,
bnb_4bit_quant_type = 'nf4',
bnb_4bit_use_double_quant = True,
)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"gouri100/Unsloth_Qwen-2.5_7B-Invoice-962",
quantization_config = bnb_config,
device_map = 'auto',
)
From PDF
from pdf2image import convert_from_path
pages = convert_from_path('invoice.pdf', dpi=200)
image = pages[0]
# then follow inference code above
With Ollama (GGUF)
ollama run gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF
Limitations
- Optimized for Indian GST invoice formats — may underperform on foreign layouts
- Scans below 100 DPI or heavily skewed images reduce accuracy
- Handwritten invoices are not supported
- Multi-page invoices: only page 1 was used during training
- Always validate extracted JSON against your business logic before use
Citation
@misc{qwen2.5-vl-7b-indian-invoice,
title = {Qwen2.5-VL-7B Fine-tuned for Indian Invoice Extraction},
author = {Your Name},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/gouri100/Unsloth_Qwen-2.5_7B-Invoice-962}}
}
Fine-tuned with Unsloth · Merged with PEFT · Trained on NVIDIA A100 80 GB
- Downloads last month
- -
Model tree for gouri100/Unsloth_Qwen-2.5_7B-Invoice-962
Base model
Qwen/Qwen2.5-VL-7B-Instruct