- Developed by: Team CodeBlooded
- Funded by: EpiUse & University of Pretoria
- Model type: DistilBertForSequenceClassification
- Language(s) (NLP): English
fin-classifier
Overview
Repository: CodeBlooded-capstone/fin-classifier A DistilBERT-based text classification model for categorizing financial transaction descriptions into one of N predefined categories.
Model Details
- Model type:
DistilBertForSequenceClassification
- Version: v1.0 (initial release)
- Hugging Face repo: https://huggingface.co/CodeBlooded-capstone/fin-classifier
- Authors: CodeBlooded
Intended Use
Primary use case
- Task: Automated categorization of banking and credit card transaction descriptions for South Afrucan banks
- Users: Personal finance apps, budgeting tools, fintech platforms
Out-of-scope use cases
- Legal or compliance decisions
- Any use requiring 100% classification accuracy or safety guarantees
Training Data
- Source: Kaggle
personal_transactions.csv
dataset - Mapping: Original vendor-level categories mapped into an internal schema of ~M high-level categories (
data/categories.json
). - Feedback augmentation: User-corrected labels from
feedback_corrected.json
are appended to the training set for continuous improvement.
Evaluation
Split: 90% train / 10% test split (seed=42) from the training corpus
Metric: Macro F1-score
Results:
- Macro F1 on test set: 0.XX (not yet measured)
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("CodeBlooded-capstone/fin-classifier")
model = AutoModelForSequenceClassification.from_pretrained("CodeBlooded-capstone/fin-classifier")
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
return_all_scores=False
)
example = "STARBUCKS STORE 1234"
print(classifier(example)) # {'label': 'Food & Dining', 'score': 0.95}
Limitations & Bias
- Performance varies by category: categories with fewer examples may see lower F1.
- The model reflects biases present in the original Kaggle dataset (e.g., over/under-representation of certain merchants).
- Should not be used as a sole source for financial decision-making.
Maintenance & Continuous Learning
- New user feedback corrections are stored in
model/feedback_corrected.json
and incorporated during retraining. - Checkpoints are saved to
results/
and versioned on Hugging Face.
License
Apache 2.0
Citation
@misc{fin-classifier2025,
author = {CodeBlooded},
title = {fin-classifier: A DistilBERT-based Transaction Categorization Model},
year = {2025},
howpublished = {\url{https://huggingface.co/CodeBlooded-capstone/fin-classifier}}
}
This model card was generated on 2025-07-12.
- Downloads last month
- 383