IlyaGusev's picture
Update README.md
639cbff
metadata
language:
  - ru
tags:
  - token-classification
license: apache-2.0
widget:
  - text: Ёпта, меня зовут придурок и я живу в жопе

RuBERTConv Toxic Editor

Model description

Tagging model for detoxification based on rubert-base-cased-conversational.

4 possible classes:

  • Equal = save tokens
  • Replace = replace tokens with mask
  • Delete = remove tokens
  • Insert = insert mask before tokens

Use in pair with mask filler.

Intended uses & limitations

How to use

Colab: link

import torch
from transformers import AutoTokenizer, pipeline

tagger_model_name = "IlyaGusev/rubertconv_toxic_editor"

device = "cuda" if torch.cuda.is_available() else "cpu"
device_num = 0 if device == "cuda" else -1
tagger_pipe = pipeline(
    "token-classification",
    model=tagger_model_name,
    tokenizer=tagger_model_name,
    framework="pt",
    device=device_num,
    aggregation_strategy="max"
)

text = "..."
tagger_predictions = tagger_pipe([text], batch_size=1)
sample_predictions = tagger_predictions[0]
print(sample_predictions)

Training data

Training procedure

Eval results

TBA