Invoice NER v1 — mychen76

A LayoutLMv3-large model fine-tuned for Named Entity Recognition on invoice documents. This is fine-tuning pipeline being developed for production invoice processing — including Ghanaian and West African invoices.

Model Description

This model extracts 15 structured entity types from invoice images using both text content and spatial layout simultaneously. LayoutLMv3 is a multimodal transformer that processes words, bounding boxes, and image patches together — making it significantly more accurate than text-only NER models on document images.

Training Data

Property	Detail
Source dataset	`mychen76/invoices-and-receipts_ocr_v1`
Total samples	2,043
Kept after filtering	416 (20.4% keep rate)
OCR engine	docTR (db_resnet50 + crnn_vgg16_bn)
Annotation method	Fuzzy match (threshold=90%) + spatial constraints
Min labelled tokens	8 non-O tags per sample

Training Configuration

Hyperparameter	Value
Base model	`cuongdz01/layoutlmv3-large-cord`
Learning rate	1e-5
Epochs	5
Batch size	4 (2 per GPU × 2× T4)
Optimiser	AdamW (weight decay=0.01)
LR scheduler	Linear warmup (10%) + linear decay
Max sequence length	512
Train/val split	90/10 (374 train / 42 val)
Platform	Kaggle 2× T4 GPU

Results

Epoch	Train Loss	Val Loss	Val F1
1	0.7984	0.1447	0.8964
2	0.0918	0.0658	0.9176
3	0.0504	0.0484	0.9446
4	0.0393	0.0492	0.9472
5	0.0335	0.0478	0.9497

Best Val F1: 0.9497 (epoch 5)

Entity Types

This model recognises 15 entity types using BIO tagging (30 labels total including B- and I- prefixes):

Entity	Description
INVOICE_NUMBER	Unique invoice identifier
INVOICE_DATE	Date invoice was issued
DUE_DATE	Payment due date
REFERENCE_NUMBER	PO or reference number
SELLER_NAME	Name of the selling entity
SELLER_ADDRESS	Address of the seller
CLIENT_NAME	Name of the buying entity
CLIENT_ADDRESS	Address of the client
ITEM_NAME	Name of line item product/service
ITEM_DESC	Description of line item
QTY	Quantity of line item
UNIT_PRICE	Price per unit
LINE_TOTAL	Total for a single line item
TOTAL_AMOUNT	Final invoice total
TAX_AMOUNT	Tax or VAT amount

Annotation Pipeline

Key design decisions that ensure label quality:

docTR only — consistent OCR across all datasets
Spatial constraints — TOTAL_AMOUNT only matched in bottom 30% of page, SELLER_NAME only in top 35%, etc.
Fuzzy threshold 90% — high-confidence matches only
MIN 8 non-O tags — every sample has enough labelled tokens
No images stored — words, boxes, and tags only; images loaded at training time from original dataset

Limitations

Trained on synthetic Western-style invoices only
Performance drops significantly on real-world or non-Western invoice layouts (~25-50% on Ghanaian invoices)
Intermediate checkpoint — superseded by v2, v3, v4, v5

Sequential Fine-tuning Pipeline

Downloads last month: 37

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for albertosei/invoice-ner-v1-mychen76

Base model

microsoft/layoutlmv3-large

Finetuned

cuongdz01/layoutlmv3-large-cord

Finetuned

(1)

this model

Finetunes

2 models

albertosei
/

invoice-ner-v1-mychen76