Transaction Classifier — Fine-tuned MiniLM (v4)

A fine-tuned sentence-transformers/all-MiniLM-L6-v2 model that classifies raw bank transaction strings into 10 budget categories using standard cross-entropy fine-tuning.

This is version 4 (Phase 4b) in a progressive model development series. It was the production model before being succeeded by the metadata-enriched variant (v7).

Model Details

Property	Value
Base model	`sentence-transformers/all-MiniLM-L6-v2` (22M params)
Task	Multi-class text classification (10 categories)
Training samples	8,000
Epochs	3
Batch size	64
Learning rate	2e-5
Max sequence length	64 tokens
Loss	Cross-entropy
Format	SafeTensors
Trained	2026-03-29

ID	Category
0	Food & Dining
1	Transportation
2	Shopping & Retail
3	Entertainment & Recreation
4	Healthcare & Medical
5	Utilities & Services
6	Financial Services
7	Income
8	Government & Legal
9	Charity & Donations

Performance

Evaluated on 505 unique real-world RBC transactions (3,113 weighted, 2019-2026). Results shown are after Phase 4b preprocessing fixes.

Overall

Metric	Score
Real-world accuracy (weighted)	86.5%
ML-only accuracy	78.7%
Validation accuracy	93.0%

Per-Category Accuracy

Category	Accuracy
Income	100.0%
Healthcare & Medical	100.0%
Financial Services	94.7%
Food & Dining	89.3%
Entertainment & Recreation	88.6%
Transportation	83.3%
Shopping & Retail	78.9%
Utilities & Services	68.4%
Government & Legal	54.5%
Charity & Donations	0.0%

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "maaz-zaidi/transaction-classifier-minilm"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

categories = [
    "Food & Dining", "Transportation", "Shopping & Retail",
    "Entertainment & Recreation", "Healthcare & Medical",
    "Utilities & Services", "Financial Services", "Income",
    "Government & Legal", "Charity & Donations"
]

text = "UBER TRIP HELP.UBER.COM ON"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64)

with torch.no_grad():
    logits = model(**inputs).logits
    predicted = torch.argmax(logits, dim=-1).item()

print(f"Category: {categories[predicted]}")
# Output: Category: Transportation

Training Data

Primary: mitulshah/transaction-categorization - 3.6M records, 8K sampled for training (gated dataset)
Evaluation: 505 real-world RBC bank transactions (2019-2026)

Key Improvements Over Previous Versions

v3 (SetFit) -> v4: Switched from contrastive learning to standard cross-entropy fine-tuning. Accuracy improved from 80.5% to 84.5%.
Phase 4b fixes: Preprocessing improvements (AMZN MKTP -> AMAZON MARKETPLACE mapping, ATM/mobile deposit/card fee markers). Accuracy improved from 84.5% to 86.5%.
Utilities & Services jumped from 34.2% to 68.4%.

Part of a Series

See the Transaction Classifier collection for all 7 model versions.

Limitations

Trained on only 8,000 samples from a synthetic dataset
Charity & Donations: 0% accuracy due to insufficient training examples
Domain-specific to Canadian banking transaction formats
Best results achieved within a multi-stage pipeline (direction detection + rules + merchant KB + ML)

Citation

@misc{zaidi2026txnclassifier,
  title={Transaction Classifier: Multi-Stage Bank Transaction Categorization},
  author={Maaz Zaidi},
  year={2026},
  url={https://huggingface.co/maaz-zaidi/transaction-classifier-minilm}
}

Downloads last month: 17

Safetensors

Model size

22.7M params

Tensor type

F32

Model tree for maaz-zaidi/transaction-classifier-minilm

Base model

sentence-transformers/all-MiniLM-L6-v2

Finetuned

(881)

this model

Dataset used to train maaz-zaidi/transaction-classifier-minilm

Collection including maaz-zaidi/transaction-classifier-minilm

Transaction Classifier

Collection

A versioned progressive model series for classifying raw bank transaction strings into 10 budget categories. • 7 items • Updated 11 days ago

Evaluation results

Real-World Accuracy (Weighted)
self-reported

0.865
ML-Only Accuracy
self-reported

0.787
Validation Accuracy
self-reported

0.930

maaz-zaidi
/

transaction-classifier-minilm