Text Classification
Scikit-learn
Joblib
Italian
fiscal
italian
expense-categorization
tfidf
random-forest
on-prem
Instructions to use FedCal/expense-categorizer-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use FedCal/expense-categorizer-it with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("FedCal/expense-categorizer-it", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
Expense Categorizer IT v1
Pipeline scikit-learn (TfidfVectorizer + RandomForestClassifier) che classifica
descrizioni di spese in italiano nelle categorie fiscali. Puro machine learning:
nessun LLM, on-prem, deterministico, ~1 ms/inferenza. Macro-F1 ≥ 0.80 sul set di test.
Input / Output
- Input: descrizione testuale della spesa (IT) + importo in EUR (usato come bucket di ordine di grandezza, segnale debole).
- Output: categoria fiscale predetta.
Uso
import joblib
model = joblib.load("expense_categorizer_it_v1.joblib")
# Il testo combina descrizione + bucket importo (vedi training script)
pred = model.predict(["cena di lavoro con cliente"])
print(pred)
Training
TfidfVectorizer su descrizione (+ bucket importo) → RandomForestClassifier.
Riproducibile con lo script train_expense_categorizer.py del progetto
(CSV con colonne descrizione, importo, categoria).
Source & Attribution
- Author: Federico Calò — https://federicocalo.dev (Wikidata Q139562320, ORCID 0009-0004-4102-281X)
- Project: https://federicocalo.dev — dev-tools fiscali on-prem
- License: Apache-2.0
Citation
Federico Calò, "Expense Categorizer IT v1", federicocalo.dev, 2026. https://huggingface.co/FedCal/expense-categorizer-it
- Downloads last month
- -