Developed by: Team CodeBlooded
Funded by: EpiUse & University of Pretoria
Model type: DistilBertForSequenceClassification
Language(s) (NLP): English

fin-classifier

Overview

Repository: CodeBlooded-capstone/fin-classifier A DistilBERT-based text classification model for categorizing financial transaction descriptions into one of N predefined categories.

Model Details

Model type: DistilBertForSequenceClassification
Version: v1.0 (initial release)
Hugging Face repo: https://huggingface.co/CodeBlooded-capstone/fin-classifier
Authors: CodeBlooded

Intended Use

Primary use case

Task: Automated categorization of banking and credit card transaction descriptions for South Afrucan banks
Users: Personal finance apps, budgeting tools, fintech platforms

Out-of-scope use cases

Legal or compliance decisions
Any use requiring 100% classification accuracy or safety guarantees

Training Data

Source: Kaggle personal_transactions.csv dataset
Mapping: Original vendor-level categories mapped into an internal schema of ~M high-level categories (data/categories.json).
Feedback augmentation: User-corrected labels from feedback_corrected.json are appended to the training set for continuous improvement.

Evaluation

Split: 90% train / 10% test split (seed=42) from the training corpus
Metric: Macro F1-score
Results:
- Macro F1 on test set: 0.XX (not yet measured)

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("CodeBlooded-capstone/fin-classifier")
model = AutoModelForSequenceClassification.from_pretrained("CodeBlooded-capstone/fin-classifier")

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    return_all_scores=False
)

example = "STARBUCKS STORE 1234"
print(classifier(example))  # {'label': 'Food & Dining', 'score': 0.95}

Limitations & Bias

Performance varies by category: categories with fewer examples may see lower F1.
The model reflects biases present in the original Kaggle dataset (e.g., over/under-representation of certain merchants).
Should not be used as a sole source for financial decision-making.

Maintenance & Continuous Learning

New user feedback corrections are stored in model/feedback_corrected.json and incorporated during retraining.
Checkpoints are saved to results/ and versioned on Hugging Face.

License

Apache 2.0

Citation

@misc{fin-classifier2025,
  author = {CodeBlooded},
  title = {fin-classifier: A DistilBERT-based Transaction Categorization Model},
  year = {2025},
  howpublished = {\url{https://huggingface.co/CodeBlooded-capstone/fin-classifier}}
}

This model card was generated on 2025-07-12.

Downloads last month: 383

Safetensors

Model size

67M params

Tensor type

F32