metadata

library_name: transformers
license: openrail++
datasets:
  - textdetox/multilingual_paradetox
  - chameleon-lizard/synthetic-multilingual-paradetox
language:
  - ru
  - en
  - am
  - uk
  - de
  - es
  - ar
  - hi
  - zh
pipeline_tag: text2text-generation

Model Card for Model ID

Finetune of the mt0-xl model for text detoxification task.

Model Details

Model Description

This is a finetune of mt0-xl model for text detoxification task. Can be used for synthetic data generation from toxic examples.

Developed by: Nikita Sushko
Model type: mt5-xl
Language(s) (NLP): English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi
License: OpenRail++
Finetuned from model: mt0-xl

Uses

This model is intended to be used as a text detoxification task in 9 languages: English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi.

Direct Use

The model may be directly used for text detoxification tasks.

How to Get Started with the Model

Use the code below to get started with the model.

import transformers

checkpoint = 'chameleon-lizard/detox-mt0-xl'

tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint)
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype='auto', device_map="auto")

pipe = transformers.pipeline(
    "text2text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_length=512,
    truncation=True,
)

language = 'English'
text = "You are a major fucking disappointment."
print(pipe('Write a non-toxic version of the following text in {language}: {text}')[0]['generated_text'])
# Resulting text: "You are a major disappointment.""

Be sure to prompt with the provided prompt format for the best performance. Failure to include target language may result in model responses be in random language.