Jean-Baptiste's picture
Update README.md
ffe64b8
metadata
language: en
tags:
  - financial
  - stocks
  - sentiment
datasets:
  - Jean-Baptiste/financial_news_sentiment_mixte_with_phrasebank_75
widget:
  - text: >-
      Fortuna Silver Mines Inc. (NYSE: FSM) (TSX: FVI) reports solid production
      results for the third quarter of 2022 from its four operating mines in the
      Americas and West Africa.
  - text: >-
      Theratechnologies Reports Financial Results for the Third Quarter of
      Fiscal 2022 and Provides Business Update -- - Q3 2022 Consolidated Revenue
      Growth of 17% to $20.8 million
  - text: The Flowr Corporation To File for CCAA Protection
  - text: >-
      Melcor REIT (TSX: MR.UN) today announced results for the third quarter
      ended September 30, 2022. Revenue was stable in the quarter and
      year-to-date. Net operating income was down 3% in the quarter at $11.61
      million due to the timing of operating expenses and inflated costs
      including utilities like gas/heat and power
license: mit

Model fine-tuned from roberta-large for sentiment classification of financial news (emphasis on Canadian news).

Introduction

This model was train on financial_news_sentiment_mixte_with_phrasebank_75 dataset. This is a customized version of the phrasebank dataset in which I kept only sentence validated by at least 75% annotators. In addition I added ~2000 articles validated manually on Canadian financial news. Therefore the model is more specifically trained for Canadian news. Final result is f1 score of 93.25% overall and 83.6% on Canadian news.

Training data

Training data was classified as follow:

class Description
0 negative
1 neutral
2 positive

How to use roberta-large-financial-news-sentiment-en with HuggingFace

Load roberta-large-financial-news-sentiment-en and its sub-word tokenizer :
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/roberta-large-financial-news-sentiment-en")
model = AutoModelForSequenceClassification.from_pretrained("Jean-Baptiste/roberta-large-financial-news-sentiment-en")


##### Process text sample (from wikipedia)

from transformers import pipeline

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
pipe("Melcor REIT (TSX: MR.UN) today announced results for the third quarter ended September 30, 2022. Revenue was stable in the quarter and year-to-date. Net operating income was down 3% in the quarter at $11.61 million due to the timing of operating expenses and inflated costs including utilities like gas/heat and power")

[{'label': 'negative', 'score': 0.9399105906486511}]

Model performances

Overall f1 score (average macro)

precision recall f1
0.9355 0.9299 0.9325

By entity

entity precision recall f1
negative 0.9605 0.9240 0.9419
neutral 0.9538 0.9459 0.9498
positive 0.8922 0.9200 0.9059