Instructions to use maaz-zaidi/transaction-classifier-canine with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use maaz-zaidi/transaction-classifier-canine with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="maaz-zaidi/transaction-classifier-canine")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("maaz-zaidi/transaction-classifier-canine") model = AutoModelForSequenceClassification.from_pretrained("maaz-zaidi/transaction-classifier-canine") - Notebooks
- Google Colab
- Kaggle
Transaction Classifier — CANINE (v6)
A fine-tuned google/canine-s character-level model that classifies bank transaction strings into 10 budget categories. This model processes raw characters directly without WordPiece tokenization, making it theoretically better suited for the abbreviations and non-standard text found in bank transactions.
This is version 6 (Phase 6b) in a progressive model development series. It was an experimental model that did not improve over the MiniLM baseline and was not adopted for production use.
Model Details
| Property | Value |
|---|---|
| Base model | google/canine-s (subword tokenization variant) |
| Task | Multi-class text classification (10 categories) |
| Training samples | 173,761 (50K base + 3x augmentation) |
| Epochs | 5 |
| Batch size | 32 |
| Learning rate | 5e-5 |
| Max sequence length | 128 characters |
| Loss | Cross-entropy |
| Format | SafeTensors |
| Size | ~504 MB |
| Trained | 2026-04-03 |
Categories
| ID | Category |
|---|---|
| 0 | Food & Dining |
| 1 | Transportation |
| 2 | Shopping & Retail |
| 3 | Entertainment & Recreation |
| 4 | Healthcare & Medical |
| 5 | Utilities & Services |
| 6 | Financial Services |
| 7 | Income |
| 8 | Government & Legal |
| 9 | Charity & Donations |
Performance
| Metric | Score |
|---|---|
| Validation accuracy | 98.2% |
Note: This model regressed on real-world evaluation compared to the MiniLM fine-tuned model (v4). While character-level processing is conceptually appealing for noisy bank transaction text, the pre-trained semantic knowledge in MiniLM's sentence embeddings proved more valuable than CANINE's character-level flexibility.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "maaz-zaidi/transaction-classifier-canine"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
categories = [
"Food & Dining", "Transportation", "Shopping & Retail",
"Entertainment & Recreation", "Healthcare & Medical",
"Utilities & Services", "Financial Services", "Income",
"Government & Legal", "Charity & Donations"
]
text = "MCDONALD'S #12345 TORONTO ON"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
predicted = torch.argmax(logits, dim=-1).item()
print(f"Category: {categories[predicted]}")
Training Data
- Primary: mitulshah/transaction-categorization - 50K samples + 3x abbreviation augmentation = 173,761 (gated dataset)
- Augmentation: Each sample generated 3 variants with character-level abbreviation patterns common in bank transactions
Why This Experiment
Bank transactions contain heavy abbreviations (MCDNLDS, AMZN MKTP, WLMRT) that break WordPiece tokenization. CANINE processes raw characters, so in theory it should handle these better. In practice, the pre-trained world knowledge in MiniLM's sentence embeddings (knowing that "MCDONALD'S" is a restaurant) was more valuable than character-level robustness.
Part of a Series
See the Transaction Classifier collection for all 7 model versions.
Limitations
- Regressed on real-world accuracy compared to MiniLM (v4)
- 504 MB model size (~6x larger than MiniLM models)
- Character-level models require more training data and compute to match subword models with pre-trained knowledge
Citation
@misc{zaidi2026txnclassifier,
title={Transaction Classifier: Multi-Stage Bank Transaction Categorization},
author={Maaz Zaidi},
year={2026},
url={https://huggingface.co/maaz-zaidi/transaction-classifier-canine}
}
- Downloads last month
- 18
Model tree for maaz-zaidi/transaction-classifier-canine
Base model
google/canine-sDataset used to train maaz-zaidi/transaction-classifier-canine
Collection including maaz-zaidi/transaction-classifier-canine
Evaluation results
- Validation Accuracyself-reported0.982