finbert-sentfin

FinBERT fine-tuned on SEntFiN 1.0 — Indian financial news sentiment classification.

Fine-tuned from ProsusAI/finbert on the SEntFiN 1.0 dataset of human-annotated Indian financial news headlines sourced from the Economic Times and Moneycontrol, covering NSE500-listed companies.

Why This Model

ProsusAI/finbert was trained primarily on US/EU financial text (SEC filings, financial news wire). On Indian financial headlines it scores approximately F1 = 0.76. This model addresses that domain gap by fine-tuning on 9,514 expert-annotated Indian financial headlines, achieving F1 = 0.873 on a held-out validation set — a +11% improvement on Indian financial text.

Model Details

Property Value
Base model ProsusAI/finbert
Architecture BERT-base (110M parameters)
Task 3-class sentiment classification
Labels positive (0), negative (1), neutral (2)
Training data SEntFiN 1.0 — 8,086 samples
Validation data SEntFiN 1.0 — 1,428 samples
Best epoch 3 / 4
F1 weighted 0.873
Accuracy 0.873
Training hardware Kaggle T4 GPU (fp16)

Training Configuration

TrainingArguments(
    num_train_epochs=4,
    per_device_train_batch_size=32,
    learning_rate=2e-5,
    warmup_steps=100,
    weight_decay=0.01,
    fp16=True,
    load_best_model_at_end=True,
    metric_for_best_model="f1_weighted",
)

Per-Class Results (Validation Set)

Class Precision Recall F1 Support
positive 0.91 0.86 0.89 504
negative 0.87 0.89 0.88 407
neutral 0.84 0.87 0.86 517
weighted avg 0.87 0.87 0.87 1428

Dataset

SEntFiN 1.0 (paper, GitHub) is a human-annotated dataset of 10,753 financial news headlines with entity-sentiment annotations, covering 920 companies in the NSE500 index (2002–2017). Headlines were sourced from the Economic Times and Moneycontrol.

Preprocessing applied:

  • Headlines with multiple entities having conflicting sentiments were dropped (1,239 rows)
  • Remaining 9,514 rows used for training
  • Stratified 85/15 train/validation split

Usage

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="tahp0604/finbert-sentfin",
)

headlines = [
    "Reliance Industries posts record quarterly profit",
    "HDFC Bank shares fall 4% after RBI penalty",
    "TCS Q4 results in line with analyst estimates",
]

for h in headlines:
    result = pipe(h)[0]
    print(f"[{result['label'].upper():8s} {result['score']:.2f}]  {h}")

Output:

[POSITIVE 0.94]  Reliance Industries posts record quarterly profit
[NEGATIVE 0.91]  HDFC Bank shares fall 4% after RBI penalty
[NEUTRAL  0.83]  TCS Q4 results in line with analyst estimates

Direct inference

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("tahp0604/finbert-sentfin")
model     = AutoModelForSequenceClassification.from_pretrained("tahp0604/finbert-sentfin")
model.eval()

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        probs = torch.softmax(model(**inputs).logits, dim=-1)[0]
    idx   = int(probs.argmax())
    label = model.config.id2label[idx]
    return label, float(probs[idx])

label, confidence = predict("Infosys raises revenue guidance for FY26")
print(f"{label} ({confidence:.2f})")  # positive (0.89)

Label Mapping

id2label = {0: "positive", 1: "negative", 2: "neutral"}
label2id = {"positive": 0, "negative": 1, "neutral": 2}

This is identical to ProsusAI/finbert — the model is a drop-in replacement.

Intended Use

  • Sentiment scoring of Indian financial news headlines
  • NSE/BSE stock news sentiment analysis
  • Component in financial signal fusion pipelines

Limitations

  • Trained on headlines from 2002–2017 — may not capture terminology from newer financial instruments or recent regulatory changes
  • Optimised for short headlines (≤ 128 tokens) — performance may degrade on full article text
  • Covers Indian equities (NSE500) — may underperform on commodities, currencies, or global markets
  • Not intended for investment decisions

Citation

If you use this model, please cite the SEntFiN dataset:

@article{sinha2022sentfin,
  title={SEntFiN 1.0: Entity-Aware Sentiment Analysis for Financial News},
  author={Sinha, Ankur and Kedas, Satishwar and Kumar, Rishu and Malo, Pekka},
  journal={Journal of the Association for Information Science and Technology},
  volume={73},
  number={9},
  pages={1314--1335},
  year={2022}
}

@article{araci2019finbert,
  title={FinBERT: Financial Sentiment Analysis with Pre-Trained Language Models},
  author={Araci, Dogu},
  journal={arXiv preprint arXiv:1908.10063},
  year={2019}
}
Downloads last month
102
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tahp0604/finbert-sentfin

Base model

ProsusAI/finbert
Finetuned
(98)
this model

Papers for tahp0604/finbert-sentfin