--- library_name: transformers license: openrail++ datasets: - textdetox/multilingual_paradetox - chameleon-lizard/synthetic-multilingual-paradetox language: - ru - en - am - uk - de - es - ar - hi - zh pipeline_tag: text2text-generation --- # Model Card for Model ID Finetune of the mt0-xl model for text detoxification task. ## Model Details ### Model Description This is a finetune of mt0-xl model for text detoxification task. Can be used for synthetic data generation from toxic examples. - **Developed by:** Nikita Sushko - **Model type:** mt5-xl - **Language(s) (NLP):** English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi - **License:** OpenRail++ - **Finetuned from model:** mt0-xl ## Uses This model is intended to be used as a text detoxification task in 9 languages: English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi. ### Direct Use The model may be directly used for text detoxification tasks. ## How to Get Started with the Model Use the code below to get started with the model. ```python import transformers checkpoint = 'chameleon-lizard/detox-mt0-xl' tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint) model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype='auto', device_map="auto") pipe = transformers.pipeline( "text2text-generation", model=model, tokenizer=tokenizer, max_length=512, truncation=True, ) language = 'English' text = "You are a major fucking disappointment." print(pipe('Write a non-toxic version of the following text in {language}: {text}')[0]['generated_text']) # Resulting text: "You are a major disappointment."" ``` Be sure to prompt with the provided prompt format for the best performance. Failure to include target language may result in model responses be in random language.