Qwen2.5-VL 7B — Indian Invoice Extraction

Fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct specialized for extracting structured JSON from Indian GST invoices (B2B, B2C, export, IRN/ACK, multi-layout). Trained with QLoRA + Unsloth on an NVIDIA A100 80 GB. Merged via PEFT merge_and_unload().


Available Versions

Version Link Use case
Merged bfloat16 gouri100/Unsloth_Qwen-2.5_7B-Invoice-962 Full precision inference
GGUF Q4_K_M gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF llama.cpp / Ollama — light GPU
GGUF Q8_0 gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF llama.cpp / Ollama — higher quality

Model Summary

Property Value
Base model Qwen/Qwen2.5-VL-7B-Instruct
Fine-tuning method QLoRA (r=64, alpha=128)
Merge method PEFT merge_and_unload() — bfloat16 safetensors
Framework Unsloth + TRL SFTTrainer
Hardware NVIDIA A100 80 GB
Task Invoice image to Structured JSON
Input types JPG, PNG, PDF (page 1 at 200 DPI)
Languages English, Hindi, Tamil, Malayalam, Telugu, Kannada, Bengali
License Apache 2.0

Training Dataset

Property Value
Total samples 962
File types JPG, PNG, PDF
PDF handling Page 1 extracted at 200 DPI, resized to max 1280px
Invoice types B2B GST, B2C, Export, IRN/ACK
Annotation Manually labeled JSON per invoice

Output JSON Schema

{
  "metadata": {
    "invoice_no": "string",
    "invoice_date": "YYYY-MM-DD",
    "irn": "string | null",
    "ack_no": "string | null",
    "ack_date": "string | null"
  },
  "supplier": {
    "name": "string",
    "gstin": "string",
    "address": "string",
    "state_code": "string"
  },
  "buyer": {
    "name": "string",
    "gstin": "string",
    "address": "string",
    "state_code": "string"
  },
  "line_items": [{
    "sl_no": "number",
    "description": "string",
    "hsn_sac": "string",
    "qty": "number",
    "unit": "string",
    "rate": "number",
    "amount": "number"
  }],
  "tax": {
    "taxable_value": "number",
    "cgst_rate": "number",
    "cgst_amount": "number",
    "sgst_rate": "number",
    "sgst_amount": "number",
    "igst_rate": "number",
    "igst_amount": "number",
    "total_tax": "number",
    "grand_total": "number",
    "round_off": "number"
  }
}

Training Configuration

Hyperparameter Value
Epochs 3
Learning rate 0.0002
LR scheduler Cosine
Warmup ratio 0.05
Per device batch size 2
Gradient accumulation steps 8
Effective batch size 16
Max sequence length 2048
Precision bfloat16
LoRA rank (r) 64
LoRA alpha 128
LoRA dropout 0.05
LoRA target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Vision layers fine-tuned Yes
Gradient checkpointing Unsloth optimized

Training Results

Metric Value
Final training loss 0.2594
Total steps N/A
Training time 2243.16s (37.4 min)
Steps per second 0.082

Inference

With transformers (merged model)

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch, json

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "gouri100/Unsloth_Qwen-2.5_7B-Invoice-962",
    torch_dtype = torch.bfloat16,
    device_map  = 'auto',
)
processor = AutoProcessor.from_pretrained("gouri100/Unsloth_Qwen-2.5_7B-Invoice-962")

image = Image.open('invoice.jpg').convert('RGB')

SYSTEM_PROMPT = (
    'You are an expert system for extracting structured data from invoices. '
    'Return ONLY valid JSON. Do NOT include explanations or extra text.'
)

messages = [
    {'role': 'system', 'content': [{'type': 'text', 'text': SYSTEM_PROMPT}]},
    {'role': 'user', 'content': [
        {'type': 'image', 'image': image},
        {'type': 'text',  'text': 'Extract structured invoice data as JSON.'}
    ]}
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt = True,
    tokenize              = True,
    return_tensors        = 'pt',
    return_dict           = True,
).to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens = 1024,
        temperature    = 0.1,
        do_sample      = False,
    )

decoded = processor.decode(
    output_ids[0][inputs['input_ids'].shape[1]:],
    skip_special_tokens = True,
)
result = json.loads(decoded)
print(json.dumps(result, indent=2, ensure_ascii=False))

Load in 4-bit (lighter GPUs)

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit              = True,
    bnb_4bit_compute_dtype    = torch.bfloat16,
    bnb_4bit_quant_type       = 'nf4',
    bnb_4bit_use_double_quant = True,
)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "gouri100/Unsloth_Qwen-2.5_7B-Invoice-962",
    quantization_config = bnb_config,
    device_map          = 'auto',
)

From PDF

from pdf2image import convert_from_path
pages = convert_from_path('invoice.pdf', dpi=200)
image = pages[0]
# then follow inference code above

With Ollama (GGUF)

ollama run gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF

Limitations

  • Optimized for Indian GST invoice formats — may underperform on foreign layouts
  • Scans below 100 DPI or heavily skewed images reduce accuracy
  • Handwritten invoices are not supported
  • Multi-page invoices: only page 1 was used during training
  • Always validate extracted JSON against your business logic before use

Citation

@misc{qwen2.5-vl-7b-indian-invoice,
  title        = {Qwen2.5-VL-7B Fine-tuned for Indian Invoice Extraction},
  author       = {Your Name},
  year         = {2025},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/gouri100/Unsloth_Qwen-2.5_7B-Invoice-962}}
}

Fine-tuned with Unsloth · Merged with PEFT · Trained on NVIDIA A100 80 GB

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gouri100/Unsloth_Qwen-2.5_7B-Invoice-962

Adapter
(281)
this model