hse-teddy-bear's picture
Update README.md
203263b verified
metadata
language:
  - ru
  - en
license: mit
tags:
  - finance
  - sentiment
  - stocks
metrics:
  - accuracy
widget:
  - text: Нуу, эту папиру надо лонговать!
    example_title: long sentiment
  - text: Не уверен. Нужно подумать, перед тем, как брать.
    example_title: neutral sentiment
  - text: Такое только хомяки берут. Нужно сливать эту бумажку поскорее.
    example_title: short sentiment

Model Details

Model Description

  • Developed by: Alexander Nikitin
  • Model type: XLM-RoBERTa-base Fine-Tuned on my labelled dataset
  • Language(s) (NLP): Russian, English
  • License: MIT
  • Finetuned from model: FacebookAI/xlm-roberta-base

Dataset

This transformer model was fine-tuned on parsed comments from "Tinkoff Pulse".

First step: Comments were preprocessed, for each stock ticker subcomment for ticker was extracted. Example: "{$GAZP} {$TCSG} {$RTKM} По газрому все хорошо. По Ростелекому не очень. Тинек идет вниз!" -> "{$GAZP} По газрому все хорошо."

Next step: Labelling dataset of 10K preprocessed comments, evenly distributed from 10 russian stocks. Used Mistral-7b LLM to label comments on 3 categories: "buy" - if author wants or encourages to buy (long), "sell" - if author wants or encourages to sell or short, "neutral" - if this is news or we cannot say for sure. Plans for further research: label 100k comments and train on them.

Bias, Risks, and Limitations

  1. Model is trained on Russian/English comments;
  2. Model is not good at extracting sentiment from comments with bright keywords in different directions, like "I wanna sell. But probably I should buy back later.";
  3. Model performs good on short-medium texts like comments, which are usually skewed to one side (strong buy or strong sell).

Recommendations

How to Get Started with the Model

Download the model with huggingface pipeline and use it!

Labels:

  • LABEL_0 = SELL
  • LABEL_1 = NEUTRAL
  • LABEL_2 = BUY

Evaluation

  • Accuracy on validation dataset: 0.786
  • Notice: this is accuracy on ~1.5k comments.

Model Card Authors

https://t.me/pivo_txt