Files changed (1) hide show
  1. README.md +42 -11
README.md CHANGED
@@ -19,30 +19,33 @@ widget:
19
  ---
20
 
21
 
 
22
 
23
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
24
- should probably proofread and complete it, then remove this comment. -->
25
 
26
- # vashkontrol-sentiment-rubert
27
 
28
- This model is a fine-tuned version of [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) on an unknown dataset.
29
  It achieves the following results on the evaluation set:
30
  - Loss: 0.1085
31
  - F1: 0.9461
32
 
33
  ## Model description
34
 
35
- More information needed
36
 
37
- ## Intended uses & limitations
38
-
39
- More information needed
40
 
41
  ## Training and evaluation data
42
 
43
- More information needed
 
 
 
 
 
 
 
 
44
 
45
- ## Training procedure
46
 
47
  ### Training hyperparameters
48
 
@@ -71,4 +74,32 @@ The following hyperparameters were used during training:
71
  - Transformers 4.31.0
72
  - Pytorch 2.0.1+cu118
73
  - Datasets 2.14.1
74
- - Tokenizers 0.13.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ---
20
 
21
 
22
+ # Sentimental assessment of portal reviews "VashKontrol"
23
 
24
+ The model is designed to evaluate the tone of reviews from the [VashKontrol portal](https://vashkontrol.ru/).
 
25
 
26
+ This model is a fine-tuned version of [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) on a following dataset: [kartashoffv/vash_kontrol_reviews](https://huggingface.co/datasets/kartashoffv/vash_kontrol_reviews).
27
 
 
28
  It achieves the following results on the evaluation set:
29
  - Loss: 0.1085
30
  - F1: 0.9461
31
 
32
  ## Model description
33
 
34
+ The model predicts a sentiment label (positive, neutral, negative) for a submitted text review.
35
 
 
 
 
36
 
37
  ## Training and evaluation data
38
 
39
+ The model was trained on the corpus of reviews of the [VashControl portal](https://vashkontrol.ru/), left by users in the period from 2020 to 2022 inclusive.
40
+ The total number of reviews was 17,385. The sentimental assessment of the dataset was carried out by the author manually by dividing the general dataset into positive/neutral/negative reviews.
41
+
42
+ The resulting classes:
43
+ 0 (positive): 13045
44
+ 1 (neutral): 1196
45
+ 2 (negative): 3144
46
+
47
+ Class weighting was used to solve the class imbalance.
48
 
 
49
 
50
  ### Training hyperparameters
51
 
 
74
  - Transformers 4.31.0
75
  - Pytorch 2.0.1+cu118
76
  - Datasets 2.14.1
77
+ - Tokenizers 0.13.3
78
+
79
+
80
+ ### Usage
81
+
82
+ ```
83
+ import torch
84
+ from transformers import AutoModelForSequenceClassification
85
+ from transformers import BertTokenizerFast
86
+
87
+ tokenizer = BertTokenizerFast.from_pretrained('kartashoffv/vashkontrol-sentiment-rubert')
88
+ model = AutoModelForSequenceClassification.from_pretrained('kartashoffv/vashkontrol-sentiment-rubert', return_dict=True)
89
+
90
+ @torch.no_grad()
91
+ def predict(review):
92
+ inputs = tokenizer(review, max_length=512, padding=True, truncation=True, return_tensors='pt')
93
+ outputs = model(**inputs)
94
+ predicted = torch.nn.functional.softmax(outputs.logits, dim=1)
95
+ pred_label = torch.argmax(predicted, dim=1).numpy()
96
+ return pred_label
97
+ ```
98
+ ### Labels
99
+
100
+ ```
101
+ 0: POSITIVE
102
+ 1: NEUTRAL
103
+ 2: NEGATIVE
104
+ ```
105
+