t-bank-ai
/

response-toxicity-classifier-base

Text Classification

Inference Endpoints

Model card Files Files and versions Community

response-toxicity-classifier-base / README.md

amarkv's picture

Update README.md

40b0fb2 over 2 years ago

|

2.38 kB

	---
	language: ["ru"]
	tags:
	- russian
	- pretraining
	- conversational
	license: mit
	widget:
	- text: "[CLS] привет [SEP] привет! [SEP] как дела? [RESPONSE_TOKEN] норм"
	example_title: "Dialog example 1"
	---

	# response-toxicity-classifier-base

	[BERT classifier from Skoltech](https://huggingface.co/Skoltech/russian-inappropriate-messages), finetuned on contextual data with 4 labels.

	# Training

	[Skoltech/russian-inappropriate-messages](https://huggingface.co/Skoltech/russian-inappropriate-messages) was finetuned on a multiclass data with four classes (check the exact mapping between idx and label in `model.config`).

	1) OK label — the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
	2) Toxic label — the message might be seen as a offensive one in given context.
	3) Severe toxic label — the message is offencive, full of anger and was written to provoke a fight or any other discomfort
	4) Risks label — the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics)

	The model was finetuned on a soon-to-be-posted dialogs datasets.

	# Evaluation results

	Model achieves the following results on the validation datasets (will be posted soon):

	\|\| OK - F1-score \| TOXIC - F1-score \| SEVERE TOXIC - F1-score \| RISKS - F1-score \|
	\|---------\|---------------\|------------------\|-------------------------\|------------------\|
	\|twitter \| 0.896 \| 0.348 \| 0.490 \| 0.591 \|
	\|chats \| 0.940 \| 0.295 \| 0.729 \| 0.46 \|

	# Use in transformers

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
	model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
	inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
	with torch.inference_mode():
	logits = model(**inputs).logits
	probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
	```


	The work was done during internship at Tinkoff by [Nikita Stepanov](https://huggingface.co/nikitast).