Qwen2.5-VL 7B — Indian Invoice Extraction

Fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct specialized for extracting structured JSON from Indian GST invoices (B2B, B2C, export, IRN/ACK, multi-layout). Trained with QLoRA + Unsloth on an NVIDIA A100 80 GB. Merged via PEFT merge_and_unload().

Available Versions

Version	Link	Use case
Merged bfloat16	gouri100/Unsloth_Qwen-2.5_7B-Invoice-962	Full precision inference
GGUF Q4_K_M	gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF	llama.cpp / Ollama — light GPU
GGUF Q8_0	gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF	llama.cpp / Ollama — higher quality

Model Summary

Property	Value
Base model	Qwen/Qwen2.5-VL-7B-Instruct
Fine-tuning method	QLoRA (r=64, alpha=128)
Merge method	PEFT merge_and_unload() — bfloat16 safetensors
Framework	Unsloth + TRL SFTTrainer
Hardware	NVIDIA A100 80 GB
Task	Invoice image to Structured JSON
Input types	JPG, PNG, PDF (page 1 at 200 DPI)
Languages	English, Hindi, Tamil, Malayalam, Telugu, Kannada, Bengali
License	Apache 2.0

Training Dataset

Property	Value
Total samples	962
File types	JPG, PNG, PDF
PDF handling	Page 1 extracted at 200 DPI, resized to max 1280px
Invoice types	B2B GST, B2C, Export, IRN/ACK
Annotation	Manually labeled JSON per invoice

Output JSON Schema

{
  "metadata": {
    "invoice_no": "string",
    "invoice_date": "YYYY-MM-DD",
    "irn": "string | null",
    "ack_no": "string | null",
    "ack_date": "string | null"
  },
  "supplier": {
    "name": "string",
    "gstin": "string",
    "address": "string",
    "state_code": "string"
  },
  "buyer": {
    "name": "string",
    "gstin": "string",
    "address": "string",
    "state_code": "string"
  },
  "line_items": [{
    "sl_no": "number",
    "description": "string",
    "hsn_sac": "string",
    "qty": "number",
    "unit": "string",
    "rate": "number",
    "amount": "number"
  }],
  "tax": {
    "taxable_value": "number",
    "cgst_rate": "number",
    "cgst_amount": "number",
    "sgst_rate": "number",
    "sgst_amount": "number",
    "igst_rate": "number",
    "igst_amount": "number",
    "total_tax": "number",
    "grand_total": "number",
    "round_off": "number"
  }
}

Training Configuration

Hyperparameter	Value
Epochs	3
Learning rate	0.0002
LR scheduler	Cosine
Warmup ratio	0.05
Per device batch size	2
Gradient accumulation steps	8
Effective batch size	16
Max sequence length	2048
Precision	bfloat16
LoRA rank (r)	64
LoRA alpha	128
LoRA dropout	0.05
LoRA target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Vision layers fine-tuned	Yes
Gradient checkpointing	Unsloth optimized

Training Results

Metric	Value
Final training loss	0.2594
Total steps	N/A
Training time	2243.16s (37.4 min)
Steps per second	0.082

Inference

With transformers (merged model)

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch, json

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "gouri100/Unsloth_Qwen-2.5_7B-Invoice-962",
    torch_dtype = torch.bfloat16,
    device_map  = 'auto',
)
processor = AutoProcessor.from_pretrained("gouri100/Unsloth_Qwen-2.5_7B-Invoice-962")

image = Image.open('invoice.jpg').convert('RGB')

SYSTEM_PROMPT = (
    'You are an expert system for extracting structured data from invoices. '
    'Return ONLY valid JSON. Do NOT include explanations or extra text.'
)

messages = [
    {'role': 'system', 'content': [{'type': 'text', 'text': SYSTEM_PROMPT}]},
    {'role': 'user', 'content': [
        {'type': 'image', 'image': image},
        {'type': 'text',  'text': 'Extract structured invoice data as JSON.'}
    ]}
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt = True,
    tokenize              = True,
    return_tensors        = 'pt',
    return_dict           = True,
).to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens = 1024,
        temperature    = 0.1,
        do_sample      = False,
    )

decoded = processor.decode(
    output_ids[0][inputs['input_ids'].shape[1]:],
    skip_special_tokens = True,
)
result = json.loads(decoded)
print(json.dumps(result, indent=2, ensure_ascii=False))

Load in 4-bit (lighter GPUs)

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit              = True,
    bnb_4bit_compute_dtype    = torch.bfloat16,
    bnb_4bit_quant_type       = 'nf4',
    bnb_4bit_use_double_quant = True,
)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "gouri100/Unsloth_Qwen-2.5_7B-Invoice-962",
    quantization_config = bnb_config,
    device_map          = 'auto',
)

From PDF

from pdf2image import convert_from_path
pages = convert_from_path('invoice.pdf', dpi=200)
image = pages[0]
# then follow inference code above

With Ollama (GGUF)

ollama run gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF

Limitations

Optimized for Indian GST invoice formats — may underperform on foreign layouts
Scans below 100 DPI or heavily skewed images reduce accuracy
Handwritten invoices are not supported
Multi-page invoices: only page 1 was used during training
Always validate extracted JSON against your business logic before use

Citation

@misc{qwen2.5-vl-7b-indian-invoice,
  title        = {Qwen2.5-VL-7B Fine-tuned for Indian Invoice Extraction},
  author       = {Your Name},
  year         = {2025},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/gouri100/Unsloth_Qwen-2.5_7B-Invoice-962}}
}

Fine-tuned with Unsloth · Merged with PEFT · Trained on NVIDIA A100 80 GB

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for gouri100/Unsloth_Qwen-2.5_7B-Invoice-962

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Adapter

(281)

this model