File size: 2,608 Bytes
e5bbd67
af6db24
2489de7
af6db24
 
57dfe7c
e5bbd67
7ade9e9
 
 
da4dc8e
 
e5bbd67
2489de7
0139ea5
2489de7
 
 
 
 
2db9271
2489de7
2db9271
 
 
 
2489de7
2db9271
2489de7
 
 
0139ea5
 
2db9271
 
d2403c4
 
0139ea5
 
 
 
 
 
 
 
 
 
 
9f849fa
0139ea5
2489de7
 
2efd8b0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
language: ["ru"]
tags:
- russian
- pretraining
- conversational
license: mit
widget:
- text: "[CLS] привет [SEP] привет! [SEP] как дела? [RESPONSE_TOKEN] норм"
  example_title: "Dialog example 1"
- text: "[CLS] привет [SEP] привет! [SEP] как дела? [RESPONSE_TOKEN] ты *****"
  example_title: "Dialog example 2"
---

# response-toxicity-classifier-base

[BERT classifier from Skoltech](https://huggingface.co/Skoltech/russian-inappropriate-messages), finetuned on contextual data with 4 labels.

# Training

[*Skoltech/russian-inappropriate-messages*](https://huggingface.co/Skoltech/russian-inappropriate-messages) was finetuned on a multiclass data with four classes (*check the exact mapping between idx and label in* `model.config`).

1) OK label — the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
2) Toxic label — the message might be seen as a offensive one in given context.
3) Severe toxic label — the message is offencive, full of anger and was written to provoke a fight or any other discomfort
4) Risks label — the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics) 

The model was finetuned on a soon-to-be-posted dialogs datasets. 

# Evaluation results

Model achieves the following results on the validation datasets (will be posted soon):

|| OK - F1-score | TOXIC - F1-score | SEVERE TOXIC - F1-score | RISKS - F1-score |
|---------|---------------|------------------|-------------------------|------------------|
|internet dialogs  | 0.896         | 0.348            | 0.490                   | 0.591            |
|chatbot dialogs    | 0.940         | 0.295            | 0.729                   | 0.46             |
 
 # Use in transformers

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
with torch.inference_mode():
    logits = model(**inputs).logits
    probas = torch.softmax(logits, dim=-1)[0].cpu().detach().numpy()
```


The work was done during internship at Tinkoff by [Nikita Stepanov](https://huggingface.co/nikitast), mentored by [Alexander Markov](https://huggingface.co/amarkv).