• Developed by: Team CodeBlooded
  • Funded by: EpiUse & University of Pretoria
  • Model type: DistilBertForSequenceClassification
  • Language(s) (NLP): English

fin-classifier

Overview

Repository: CodeBlooded-capstone/fin-classifier A DistilBERT-based text classification model for categorizing financial transaction descriptions into one of N predefined categories.


Model Details


Intended Use

Primary use case

  • Task: Automated categorization of banking and credit card transaction descriptions for South Afrucan banks
  • Users: Personal finance apps, budgeting tools, fintech platforms

Out-of-scope use cases

  • Legal or compliance decisions
  • Any use requiring 100% classification accuracy or safety guarantees

Training Data

  • Source: Kaggle personal_transactions.csv dataset
  • Mapping: Original vendor-level categories mapped into an internal schema of ~M high-level categories (data/categories.json).
  • Feedback augmentation: User-corrected labels from feedback_corrected.json are appended to the training set for continuous improvement.

Evaluation

  • Split: 90% train / 10% test split (seed=42) from the training corpus

  • Metric: Macro F1-score

  • Results:

    • Macro F1 on test set: 0.XX (not yet measured)

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("CodeBlooded-capstone/fin-classifier")
model = AutoModelForSequenceClassification.from_pretrained("CodeBlooded-capstone/fin-classifier")

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    return_all_scores=False
)

example = "STARBUCKS STORE 1234"
print(classifier(example))  # {'label': 'Food & Dining', 'score': 0.95}

Limitations & Bias

  • Performance varies by category: categories with fewer examples may see lower F1.
  • The model reflects biases present in the original Kaggle dataset (e.g., over/under-representation of certain merchants).
  • Should not be used as a sole source for financial decision-making.

Maintenance & Continuous Learning

  • New user feedback corrections are stored in model/feedback_corrected.json and incorporated during retraining.
  • Checkpoints are saved to results/ and versioned on Hugging Face.

License

Apache 2.0


Citation

@misc{fin-classifier2025,
  author = {CodeBlooded},
  title = {fin-classifier: A DistilBERT-based Transaction Categorization Model},
  year = {2025},
  howpublished = {\url{https://huggingface.co/CodeBlooded-capstone/fin-classifier}}
}

This model card was generated on 2025-07-12.

Downloads last month
383
Safetensors
Model size
67M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support