Invoice NER v1 โ€” mychen76

A LayoutLMv3-large model fine-tuned for Named Entity Recognition on invoice documents. This is fine-tuning pipeline being developed for production invoice processing โ€” including Ghanaian and West African invoices.

Model Description

This model extracts 15 structured entity types from invoice images using both text content and spatial layout simultaneously. LayoutLMv3 is a multimodal transformer that processes words, bounding boxes, and image patches together โ€” making it significantly more accurate than text-only NER models on document images.

Training Data

Property Detail
Source dataset mychen76/invoices-and-receipts_ocr_v1
Total samples 2,043
Kept after filtering 416 (20.4% keep rate)
OCR engine docTR (db_resnet50 + crnn_vgg16_bn)
Annotation method Fuzzy match (threshold=90%) + spatial constraints
Min labelled tokens 8 non-O tags per sample

Training Configuration

Hyperparameter Value
Base model cuongdz01/layoutlmv3-large-cord
Learning rate 1e-5
Epochs 5
Batch size 4 (2 per GPU ร— 2ร— T4)
Optimiser AdamW (weight decay=0.01)
LR scheduler Linear warmup (10%) + linear decay
Max sequence length 512
Train/val split 90/10 (374 train / 42 val)
Platform Kaggle 2ร— T4 GPU

Results

Epoch Train Loss Val Loss Val F1
1 0.7984 0.1447 0.8964
2 0.0918 0.0658 0.9176
3 0.0504 0.0484 0.9446
4 0.0393 0.0492 0.9472
5 0.0335 0.0478 0.9497

Best Val F1: 0.9497 (epoch 5)

Entity Types

This model recognises 15 entity types using BIO tagging (30 labels total including B- and I- prefixes):

Entity Description
INVOICE_NUMBER Unique invoice identifier
INVOICE_DATE Date invoice was issued
DUE_DATE Payment due date
REFERENCE_NUMBER PO or reference number
SELLER_NAME Name of the selling entity
SELLER_ADDRESS Address of the seller
CLIENT_NAME Name of the buying entity
CLIENT_ADDRESS Address of the client
ITEM_NAME Name of line item product/service
ITEM_DESC Description of line item
QTY Quantity of line item
UNIT_PRICE Price per unit
LINE_TOTAL Total for a single line item
TOTAL_AMOUNT Final invoice total
TAX_AMOUNT Tax or VAT amount

Annotation Pipeline

Key design decisions that ensure label quality:

  • docTR only โ€” consistent OCR across all datasets
  • Spatial constraints โ€” TOTAL_AMOUNT only matched in bottom 30% of page, SELLER_NAME only in top 35%, etc.
  • Fuzzy threshold 90% โ€” high-confidence matches only
  • MIN 8 non-O tags โ€” every sample has enough labelled tokens
  • No images stored โ€” words, boxes, and tags only; images loaded at training time from original dataset

Limitations

  • Trained on synthetic Western-style invoices only
  • Performance drops significantly on real-world or non-Western invoice layouts (~25-50% on Ghanaian invoices)
  • Intermediate checkpoint โ€” superseded by v2, v3, v4, v5

Sequential Fine-tuning Pipeline

Downloads last month
37
Safetensors
Model size
0.4B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for albertosei/invoice-ner-v1-mychen76

Finetuned
(1)
this model
Finetunes
2 models

Dataset used to train albertosei/invoice-ner-v1-mychen76