Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

MiniCPM-V-2_6 LoRA โ€” trade-invoice line-item extraction (grounded-synthetic)

A LoRA adapter for openbmb/MiniCPM-V-2_6 that extracts line items from trade / contractor invoices as structured JSON ({"menu": [{"nm", "cnt", "price"}], "total": ...}) โ€” HVAC, electrical, plumbing, carpentry, roofing. Built for the Quillwright project (a small-model agent that drafts trade estimates).

Why this exists

No public dataset of trade/contractor invoices with structured line-item annotations exists (we searched HF Hub, academic benchmarks, and Kaggle exhaustively โ€” the closest public option, naver-clova-ix/cord-v2, is Indonesian restaurant receipts). So we built a grounded-synthetic corpus: a curated catalog of ~381 real trade parts/services with real price bands (sourced from public retail listings, contractor flat-rate templates, and cost guides), assembled into coherent jobs with code-owned arithmetic and rendered through 8 distinct invoice templates. The generator and catalog live in the Quillwright repo (finetune/synth/).

Results โ€” baseline vs. tuned (held-out synthetic test split, n=50)

Held-out test invoices are generated with a different random seed than training, so the model never saw these exact jobs. Deterministic greedy decoding; 0 generation failures.

Metric Baseline (un-tuned) Tuned (this LoRA) ฮ”
Item F1 0.703 0.933 +0.230
Quantity accuracy 0.840 1.000 +0.160
Price accuracy 0.643 1.000 +0.357
Precision 0.700 0.933 +0.233
Recall 0.707 0.933 +0.226

โš ๏ธ Honest scope: this is an IN-DISTRIBUTION result

The test split shares the same templates, catalog, and price-formatting as training (only the job combinations differ). So this measures how well the model learns the generator's trade-document structure โ€” not transfer to real, photographed trade invoices. The perfect qty/price (1.000) and the very low final training loss are consistent with strong in-distribution fit. Treat +0.23 as "learns trade line-item extraction on clean, consistently-formatted invoices," not "0.93 on a phone photo of a real contractor's bill." Validating against a real-invoice held-out set is documented future work (the generator supports an Augraphy "scanned/photographed" degradation mode, off by default here).

A companion adapter trained on the public CORD benchmark (real receipt photos) shows a more conservative +0.09 item-F1 โ€” the honest real-world-noise data point.

Training

  • Base: openbmb/MiniCPM-V-2_6 (8B vision-language model)
  • Data: 1,000 grounded-synthetic trade invoices (clean WeasyPrint renders); held-out 50-invoice test split (different seed). Catalog + generator: Quillwright finetune/synth/.
  • Recipe: OpenBMB official finetune.py + CPMTrainer, single GPU (L40S), no DeepSpeed, bf16 LoRA. LoRA on the LLM self-attention projections (q/k/v/o) only; vision tower + resampler frozen. r=64, ฮฑ=64, dropout=0.05, lr=1e-5, model_max_length=2048, 3 epochs.

Inference

from peft import PeftModel
from transformers import AutoModel, AutoTokenizer
from PIL import Image

PROMPT = ('Extract the line items from this receipt as JSON with this exact shape: '
          '{"menu": [{"nm": <item name>, "cnt": <quantity>, "price": <price>}], '
          '"total": <grand total>}. Output only the JSON.')

base = AutoModel.from_pretrained("openbmb/MiniCPM-V-2_6", trust_remote_code=True,
                                 attn_implementation="sdpa")
model = PeftModel.from_pretrained(base, "Aarya2004/minicpmv-trade-lora",
                                  trust_remote_code=True).eval().cuda()
tok = AutoTokenizer.from_pretrained("openbmb/MiniCPM-V-2_6", trust_remote_code=True)

img = Image.open("invoice.jpg").convert("RGB")
msgs = [{"role": "user", "content": [img, PROMPT]}]
print(model.chat(image=None, msgs=msgs, tokenizer=tok, sampling=False))

MiniCPM-V-2_6's remote code hard-imports flash_attn even with SDPA; if you hit that ImportError, strip flash_attn from transformers.dynamic_module_utils.get_imports (see finetune/flash_patch.py in the Quillwright repo) โ€” flash-attn is not required.

Attribution

  • Base model: MiniCPM-V-2_6 ยฉ OpenBMB.
  • Training data: synthetic, generated by the Quillwright pipeline from a catalog grounded in public pricing data (own license). Fine-tune recipe adapted from OpenBMB's official MiniCPM-V finetune scripts (Apache-2.0).
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Aarya2004/minicpmv-trade-lora

Adapter
(24)
this model