finbert-minilm-sentiment

A compact financial-sentiment classifier — 33.4M parameters, 95.3% real test accuracy.

finbert-minilm-sentiment classifies financial text into negative / neutral / positive sentiment. It is a compact MiniLM encoder (microsoft/MiniLM-L12-H384-uncased, 33.4M parameters) fine-tuned on the Financial PhraseBank benchmark. All metrics below are real, measured numbers on a held-out test set that was never seen during training — no illustrative or synthetic figures.

Companion project: this is the production variant. The from-scratch educational transformer (~2M params, no pretrained weights, fully annotated) lives at github.com/shaikn6/nano-finbert — built to teach the internals; this model is the fine-tuned version tuned for real accuracy.

Results (held-out test set)

Evaluated on a stratified 15% test split (340 sentences) of Financial PhraseBank Sentences_AllAgree — held out completely during training and validation.

Metric	Value
Test accuracy	95.29%
Test macro-F1	0.9368

Per-class performance:

Class	Precision	Recall	F1	Support
negative	0.9130	0.9333	0.9231	45
neutral	0.9902	0.9665	0.9782	209
positive	0.8889	0.9302	0.9091	86

Dataset

Source: Financial PhraseBank (Malo et al., 2014), config Sentences_AllAgree — 2,264 sentences where all annotators agreed on the label (the cleanest subset).
Labels: 3-class sentiment (negative, neutral, positive).
Split: stratified 70 / 15 / 15 train / validation / test (random_state=42).
- train = 1,584 · validation = 340 · test = 340 (held out)

Training details

Setting	Value
Base encoder	`microsoft/MiniLM-L12-H384-uncased` (12 layers, hidden 384)
Parameters	33,361,155 (~33.4M)
Objective	class-weighted cross-entropy (handles label imbalance)
Optimizer	AdamW, weight decay 0.01
Learning rate	3e-5, linear schedule, 10% warmup
Batch size	16
Max sequence length	96 tokens
Epochs	8 (best validation checkpoint selected)
Hardware	CPU only

Model selection used best validation accuracy; the reported figures are then computed once on the untouched test set.

Usage

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_id = "9mark9/finbert-minilm-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

texts = [
    "The company reported record quarterly profit, beating analyst expectations.",
    "Net sales decreased by 12% and the firm warned of further losses ahead.",
    "The board will meet next Tuesday to review the quarterly schedule.",
]

enc = tokenizer(texts, padding=True, truncation=True, max_length=96, return_tensors="pt")
with torch.no_grad():
    probs = model(**enc).logits.softmax(-1)

for text, p in zip(texts, probs):
    label = model.config.id2label[int(p.argmax())]
    print(f"[{label:8s} {p.max():.2f}] {text}")
# [positive 0.97] The company reported record quarterly profit, ...
# [negative 0.98] Net sales decreased by 12% and the firm warned ...
# [neutral  0.99] The board will meet next Tuesday to review ...

Intended use

Sentiment scoring of short, formal financial text: news headlines, earnings summaries, analyst notes, regulatory filings.
Research, prototyping, and educational use as a compact, CPU-friendly alternative to 110M-parameter FinBERT.

Limitations

Trained on the AllAgree subset, which skews neutral (1,391 of 2,264). The minority negative class has the fewest examples; expect slightly lower reliability there.
Domain is formal financial reporting language (Reuters-style). Performance on informal text (social media, retail-investor slang, emojis) is not characterised by this benchmark.
Not financial advice. Outputs are sentiment labels, not buy/sell signals.
Max sequence length is 96 tokens; longer documents are truncated.

Reproducibility

Metrics are reproducible: reloading the published model.safetensors and re-running inference on the same stratified test split (random_state=42) reproduces 95.29% accuracy exactly.

Citation

Dataset:

@article{malo2014good,
  title={Good debt or bad debt: Detecting semantic orientations in economic texts},
  author={Malo, Pekka and Sinha, Ankur and Korhonen, Pekka and Wallenius, Jyrki and Takala, Pyry},
  journal={Journal of the Association for Information Science and Technology},
  volume={65}, number={4}, pages={782--796}, year={2014}
}

Contact

Maintainer: nagizaazs@gmail.com

License

MIT

Downloads last month: 15

Safetensors

Model size

33.4M params

Tensor type

F32

Dataset used to train 9mark9/finbert-minilm-sentiment

Space using 9mark9/finbert-minilm-sentiment 1

Collection including 9mark9/finbert-minilm-sentiment

LLM Engineering & AI Safety

Collection

Production LLM systems, agent observability, and red-teaming — live interactive demos. Source on GitHub: github.com/shaikn6 • 5 items • Updated 2 days ago

Evaluation results

Test Accuracy on Financial PhraseBank (Sentences_AllAgree)
self-reported

0.953
Test Macro-F1 on Financial PhraseBank (Sentences_AllAgree)
self-reported

0.937