File size: 2,302 Bytes
864e8e3
380b159
 
 
bb952a2
380b159
 
 
 
bb952a2
 
ae0710e
bb952a2
91aad88
dae069e
91aad88
bb952a2
91aad88
bb952a2
89e75c8
 
 
 
 
 
 
 
 
 
 
 
 
d0569ec
89e75c8
d0569ec
89e75c8
d0569ec
 
 
89e75c8
d0569ec
 
 
 
89e75c8
d0569ec
89e75c8
d0569ec
 
 
89e75c8
d0569ec
89e75c8
 
 
d0569ec
89e75c8
4c22918
 
 
 
89e75c8
 
 
203263b
 
89e75c8
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
language:
- ru
- en
license: mit
tags:
- finance
- sentiment
- stocks
metrics:
- accuracy
widget:
- text: Нуу, эту папиру надо лонговать!
  example_title: long sentiment
- text: Не уверен. Нужно подумать, перед тем, как брать.
  example_title: neutral sentiment
- text: Такое только хомяки берут. Нужно сливать эту бумажку поскорее.
  example_title: short sentiment
---

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Alexander Nikitin
- **Model type:** XLM-RoBERTa-base Fine-Tuned on my labelled dataset
- **Language(s) (NLP):** Russian, English
- **License:** MIT
- **Finetuned from model:** FacebookAI/xlm-roberta-base

## Dataset

This transformer model was fine-tuned on parsed comments from "Tinkoff Pulse".

First step:
Comments were preprocessed, for each stock ticker subcomment for ticker was extracted.
Example: "{$GAZP} {$TCSG} {$RTKM} По газрому все хорошо. По Ростелекому не очень. Тинек идет вниз!" -> "{$GAZP} По газрому все хорошо."

Next step: 
Labelling dataset of 10K preprocessed comments, evenly distributed from 10 russian stocks. 
Used Mistral-7b LLM to label comments on 3 categories: "buy" - if author wants or encourages to buy (long), "sell" - if author wants or encourages to sell or short, "neutral" - if this is news or we cannot say for sure.
Plans for further research: label 100k comments and train on them. 

## Bias, Risks, and Limitations

1. Model is trained on Russian/English comments;
2. Model is not good at extracting sentiment from comments with bright keywords in different directions, like "I wanna sell. But probably I should buy back later.";
3. Model performs good on short-medium texts like comments, which are usually skewed to one side (strong buy or strong sell).

### Recommendations

## How to Get Started with the Model

Download the model with huggingface pipeline and use it!

Labels:
- LABEL_0 = SELL
- LABEL_1 = NEUTRAL
- LABEL_2 = BUY

## Evaluation

- Accuracy on validation dataset: 0.786
- Notice: this is accuracy on ~1.5k comments.

## Model Card Authors

https://t.me/pivo_txt