Transaction Classifier — Fine-tuned MiniLM (v4)

A fine-tuned sentence-transformers/all-MiniLM-L6-v2 model that classifies raw bank transaction strings into 10 budget categories using standard cross-entropy fine-tuning.

This is version 4 (Phase 4b) in a progressive model development series. It was the production model before being succeeded by the metadata-enriched variant (v7).

Model Details

Property Value
Base model sentence-transformers/all-MiniLM-L6-v2 (22M params)
Task Multi-class text classification (10 categories)
Training samples 8,000
Epochs 3
Batch size 64
Learning rate 2e-5
Max sequence length 64 tokens
Loss Cross-entropy
Format SafeTensors
Trained 2026-03-29

Categories

ID Category
0 Food & Dining
1 Transportation
2 Shopping & Retail
3 Entertainment & Recreation
4 Healthcare & Medical
5 Utilities & Services
6 Financial Services
7 Income
8 Government & Legal
9 Charity & Donations

Performance

Evaluated on 505 unique real-world RBC transactions (3,113 weighted, 2019-2026). Results shown are after Phase 4b preprocessing fixes.

Overall

Metric Score
Real-world accuracy (weighted) 86.5%
ML-only accuracy 78.7%
Validation accuracy 93.0%

Per-Category Accuracy

Category Accuracy
Income 100.0%
Healthcare & Medical 100.0%
Financial Services 94.7%
Food & Dining 89.3%
Entertainment & Recreation 88.6%
Transportation 83.3%
Shopping & Retail 78.9%
Utilities & Services 68.4%
Government & Legal 54.5%
Charity & Donations 0.0%

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "maaz-zaidi/transaction-classifier-minilm"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

categories = [
    "Food & Dining", "Transportation", "Shopping & Retail",
    "Entertainment & Recreation", "Healthcare & Medical",
    "Utilities & Services", "Financial Services", "Income",
    "Government & Legal", "Charity & Donations"
]

text = "UBER TRIP HELP.UBER.COM ON"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64)

with torch.no_grad():
    logits = model(**inputs).logits
    predicted = torch.argmax(logits, dim=-1).item()

print(f"Category: {categories[predicted]}")
# Output: Category: Transportation

Training Data

Key Improvements Over Previous Versions

  • v3 (SetFit) -> v4: Switched from contrastive learning to standard cross-entropy fine-tuning. Accuracy improved from 80.5% to 84.5%.
  • Phase 4b fixes: Preprocessing improvements (AMZN MKTP -> AMAZON MARKETPLACE mapping, ATM/mobile deposit/card fee markers). Accuracy improved from 84.5% to 86.5%.
  • Utilities & Services jumped from 34.2% to 68.4%.

Part of a Series

See the Transaction Classifier collection for all 7 model versions.

Limitations

  • Trained on only 8,000 samples from a synthetic dataset
  • Charity & Donations: 0% accuracy due to insufficient training examples
  • Domain-specific to Canadian banking transaction formats
  • Best results achieved within a multi-stage pipeline (direction detection + rules + merchant KB + ML)

Citation

@misc{zaidi2026txnclassifier,
  title={Transaction Classifier: Multi-Stage Bank Transaction Categorization},
  author={Maaz Zaidi},
  year={2026},
  url={https://huggingface.co/maaz-zaidi/transaction-classifier-minilm}
}
Downloads last month
17
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for maaz-zaidi/transaction-classifier-minilm

Finetuned
(881)
this model

Dataset used to train maaz-zaidi/transaction-classifier-minilm

Collection including maaz-zaidi/transaction-classifier-minilm

Evaluation results