File size: 1,869 Bytes
13acf9d
 
 
 
 
 
e19182e
13acf9d
 
ea0b7e7
 
 
 
 
697593c
 
 
ea0b7e7
07a4cbb
 
2afee76
07a4cbb
2afee76
07a4cbb
 
 
 
 
 
 
 
 
 
 
 
 
bb27256
07a4cbb
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
language: "en"
tags:
- financial-sentiment-analysis
- sentiment-analysis
widget:
- text: "growth is strong and we have plenty of liquidity"
---

`FinBERT` is a BERT model pre-trained on financial communication text. The purpose is to enhance financial NLP research and practice. It is trained on the following three financial communication corpus. The total corpora size is 4.9B tokens.
- Corporate Reports 10-K & 10-Q: 2.5B tokens
- Earnings Call Transcripts: 1.3B tokens
- Analyst Reports: 1.1B tokens

More technical details on `FinBERT`: [Click Link](https://github.com/yya518/FinBERT)

Please check out our working paper [`FinBERT—A Deep Learning Approach to Extracting Textual Information`](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3910214).

This released `finbert-tone` model is the `FinBERT` model fine-tuned on 10,000 manually annotated (positive, negative, neutral) sentences from analyst reports. This model achieves superior performance on financial tone analysis task. If you are simply interested in using `FinBERT` for financial tone analysis, give it a try.

# How to use 
You can use this model with Transformers pipeline for sentiment analysis.
```python
from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipeline

finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')

nlp = pipeline("sentiment-analysis", model=finbert, tokenizer=tokenizer)

sentences = ["there is a shortage of capital, and we need extra financing",  
             "growth is strong and we have plenty of liquidity", 
             "there are doubts about our finances", 
             "profits are flat"]
results = nlp(sentences)
print(results)  #LABEL_0: neutral; LABEL_1: positive; LABEL_2: negative

```