FinBERT
is a BERT model pre-trained on financial communication text. The purpose is to enhance financial NLP research and practice. It is trained on the following three financial communication corpus. The total corpora size is 4.9B tokens.
- Corporate Reports 10-K & 10-Q: 2.5B tokens
- Earnings Call Transcripts: 1.3B tokens
- Analyst Reports: 1.1B tokens
This released version of FinBert
is fine-tuned on 10,000 manually annotated (positive, negative, neutral) sentences from analyst reports. It is periodically updated with fresh data and annotations as financial language changes. The basis for this version comes from yiyanghkust/finbert-tone
.
It's built off of the Academic work:
Huang, Allen H., Hui Wang, and Yi Yang. "FinBERT: A Large Language Model for Extracting Information from Financial Text." Contemporary Accounting Research (2022).
How to use
You can use this model with Transformers pipeline for sentiment analysis.
from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipeline
finbert = BertForSequenceClassification.from_pretrained('Hatman/finbert',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('Hatman/finbert')
nlp = pipeline("sentiment-analysis", model=finbert, tokenizer=tokenizer)
sentences = ["there is a shortage of capital, and we need extra financing",
"growth is strong and we have plenty of liquidity",
"there are doubts about our finances",
"profits are flat"]
results = nlp(sentences)
print(results) #LABEL_0: neutral; LABEL_1: positive; LABEL_2: negative
- Downloads last month
- 5