Jean-Baptiste commited on
Commit
e61d990
·
1 Parent(s): 30e27d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -1
README.md CHANGED
@@ -9,4 +9,63 @@ widget:
9
  - text: "Melcor REIT (TSX: MR.UN) today announced results for the third quarter ended September 30, 2022. Revenue was stable in the quarter and year-to-date. Net operating income was down 3% in the quarter at $11.61 million due to the timing of operating expenses and inflated costs including utilities like gas/heat and power"
10
 
11
  license: mit
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - text: "Melcor REIT (TSX: MR.UN) today announced results for the third quarter ended September 30, 2022. Revenue was stable in the quarter and year-to-date. Net operating income was down 3% in the quarter at $11.61 million due to the timing of operating expenses and inflated costs including utilities like gas/heat and power"
10
 
11
  license: mit
12
+ ---
13
+
14
+ # Model fine-tuned from roberta-large for sentiment classification of financial news (emphasis on Canadian news).
15
+
16
+ ### Introduction
17
+ This model was train on financial_news_sentiment_mixte_with_phrasebank_75 dataset.
18
+ This is a customized version of the phrasebank dataset in which I kept only sentence validated by at least 75% annotators.
19
+ In addition I added ~2000 articles validated manually on Canadian financial news. Therefore the model is more specifically trained for Canadian news.
20
+ Final result is f1 score of 93.25% overall and 83.6% on Canadian news.
21
+
22
+
23
+
24
+
25
+ ### Training data
26
+ Training data was classified as follow:
27
+
28
+ class |Description
29
+ -|-
30
+ 0 |negative
31
+ 1 |neutral
32
+ 2 |positive
33
+
34
+
35
+ ### How to use roberta-large-financial-news-sentiment-en with HuggingFace
36
+
37
+ ##### Load roberta-large-financial-news-sentiment-en and its sub-word tokenizer :
38
+
39
+ ```python
40
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
41
+
42
+ tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/roberta-large-financial-news-sentiment-en")
43
+ model = AutoModelForSequenceClassification.from_pretrained("Jean-Baptiste/roberta-large-financial-news-sentiment-en")
44
+
45
+
46
+ ##### Process text sample (from wikipedia)
47
+
48
+ from transformers import pipeline
49
+
50
+ pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
51
+ pipe("Melcor REIT (TSX: MR.UN) today announced results for the third quarter ended September 30, 2022. Revenue was stable in the quarter and year-to-date. Net operating income was down 3% in the quarter at $11.61 million due to the timing of operating expenses and inflated costs including utilities like gas/heat and power")
52
+
53
+ [{'label': 'negative', 'score': 0.9399105906486511}]
54
+
55
+ ```
56
+
57
+ ### Model performances
58
+
59
+ Overall f1 score (average macro)
60
+
61
+ precision|recall|f1
62
+ -|-|-
63
+ 0.9355|0.9299|0.9325
64
+
65
+ By entity
66
+
67
+ entity|precision|recall|f1
68
+ -|-|-|-
69
+ negative|0.9605|0.9240|0.9419
70
+ neutral|0.9538|0.9459|0.9498
71
+ positive|0.8922|0.9200|0.9059