finbert-sentfin
FinBERT fine-tuned on SEntFiN 1.0 — Indian financial news sentiment classification.
Fine-tuned from ProsusAI/finbert on the SEntFiN 1.0 dataset of human-annotated Indian financial news headlines sourced from the Economic Times and Moneycontrol, covering NSE500-listed companies.
Why This Model
ProsusAI/finbert was trained primarily on US/EU financial text (SEC filings, financial news wire). On Indian financial headlines it scores approximately F1 = 0.76. This model addresses that domain gap by fine-tuning on 9,514 expert-annotated Indian financial headlines, achieving F1 = 0.873 on a held-out validation set — a +11% improvement on Indian financial text.
Model Details
| Property | Value |
|---|---|
| Base model | ProsusAI/finbert |
| Architecture | BERT-base (110M parameters) |
| Task | 3-class sentiment classification |
| Labels | positive (0), negative (1), neutral (2) |
| Training data | SEntFiN 1.0 — 8,086 samples |
| Validation data | SEntFiN 1.0 — 1,428 samples |
| Best epoch | 3 / 4 |
| F1 weighted | 0.873 |
| Accuracy | 0.873 |
| Training hardware | Kaggle T4 GPU (fp16) |
Training Configuration
TrainingArguments(
num_train_epochs=4,
per_device_train_batch_size=32,
learning_rate=2e-5,
warmup_steps=100,
weight_decay=0.01,
fp16=True,
load_best_model_at_end=True,
metric_for_best_model="f1_weighted",
)
Per-Class Results (Validation Set)
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| positive | 0.91 | 0.86 | 0.89 | 504 |
| negative | 0.87 | 0.89 | 0.88 | 407 |
| neutral | 0.84 | 0.87 | 0.86 | 517 |
| weighted avg | 0.87 | 0.87 | 0.87 | 1428 |
Dataset
SEntFiN 1.0 (paper, GitHub) is a human-annotated dataset of 10,753 financial news headlines with entity-sentiment annotations, covering 920 companies in the NSE500 index (2002–2017). Headlines were sourced from the Economic Times and Moneycontrol.
Preprocessing applied:
- Headlines with multiple entities having conflicting sentiments were dropped (1,239 rows)
- Remaining 9,514 rows used for training
- Stratified 85/15 train/validation split
Usage
from transformers import pipeline
pipe = pipeline(
"text-classification",
model="tahp0604/finbert-sentfin",
)
headlines = [
"Reliance Industries posts record quarterly profit",
"HDFC Bank shares fall 4% after RBI penalty",
"TCS Q4 results in line with analyst estimates",
]
for h in headlines:
result = pipe(h)[0]
print(f"[{result['label'].upper():8s} {result['score']:.2f}] {h}")
Output:
[POSITIVE 0.94] Reliance Industries posts record quarterly profit
[NEGATIVE 0.91] HDFC Bank shares fall 4% after RBI penalty
[NEUTRAL 0.83] TCS Q4 results in line with analyst estimates
Direct inference
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("tahp0604/finbert-sentfin")
model = AutoModelForSequenceClassification.from_pretrained("tahp0604/finbert-sentfin")
model.eval()
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1)[0]
idx = int(probs.argmax())
label = model.config.id2label[idx]
return label, float(probs[idx])
label, confidence = predict("Infosys raises revenue guidance for FY26")
print(f"{label} ({confidence:.2f})") # positive (0.89)
Label Mapping
id2label = {0: "positive", 1: "negative", 2: "neutral"}
label2id = {"positive": 0, "negative": 1, "neutral": 2}
This is identical to ProsusAI/finbert — the model is a drop-in replacement.
Intended Use
- Sentiment scoring of Indian financial news headlines
- NSE/BSE stock news sentiment analysis
- Component in financial signal fusion pipelines
Limitations
- Trained on headlines from 2002–2017 — may not capture terminology from newer financial instruments or recent regulatory changes
- Optimised for short headlines (≤ 128 tokens) — performance may degrade on full article text
- Covers Indian equities (NSE500) — may underperform on commodities, currencies, or global markets
- Not intended for investment decisions
Citation
If you use this model, please cite the SEntFiN dataset:
@article{sinha2022sentfin,
title={SEntFiN 1.0: Entity-Aware Sentiment Analysis for Financial News},
author={Sinha, Ankur and Kedas, Satishwar and Kumar, Rishu and Malo, Pekka},
journal={Journal of the Association for Information Science and Technology},
volume={73},
number={9},
pages={1314--1335},
year={2022}
}
@article{araci2019finbert,
title={FinBERT: Financial Sentiment Analysis with Pre-Trained Language Models},
author={Araci, Dogu},
journal={arXiv preprint arXiv:1908.10063},
year={2019}
}
- Downloads last month
- 102
Model tree for tahp0604/finbert-sentfin
Base model
ProsusAI/finbert