File size: 3,564 Bytes
3b66e90
 
e6650e1
 
 
 
 
 
 
 
 
 
 
319fcae
 
e6650e1
319fcae
e6650e1
 
 
 
3b66e90
 
e6650e1
3b66e90
319fcae
3b66e90
 
 
 
e677a8e
e6650e1
3682a25
3b66e90
3682a25
3b66e90
3682a25
 
 
 
 
 
 
 
 
 
 
 
319fcae
3682a25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
library_name: transformers
tags:
- detoxification
- style_transfer
license: mit
datasets:
- textdetox/multilingual_paradetox
language:
- en
- ar
- am
- zh
- uk
- hi
- es
- ru
- de
metrics:
- chrf
pipeline_tag: text2text-generation
---

# mT5-XL Detoxification Baseline

This is a baseline detoxification model trained on released parallel corpus (dev part) of toxic texts [MultiParadetox](https://huggingface.co/datasets/textdetox/multilingual_paradetox)


## Model Details

The base model for this fine-tune is [mT5-xl](https://huggingface.co/google/mt5-xl).

## Citation

The model is developed as a baseline for [TextDetox CLEF-2024](https://pan.webis.de/clef24/pan24-web/text-detoxification.html) shared task.

If you would like to acknowledge our work, please, cite the following manuscripts:

```
@inproceedings{dementieva2024overview,
  title={Overview of the Multilingual Text Detoxification Task at PAN 2024},
  author={Dementieva, Daryna and Moskovskiy, Daniil and Babakov, Nikolay and Ayele, Abinew Ali and Rizwan, Naquee and Schneider, Frolian and Wang, Xintog and Yimam, Seid Muhie and Ustalov, Dmitry and Stakovskii, Elisei and Smirnova, Alisa and Elnagar, Ashraf and Mukherjee, Animesh and Panchenko, Alexander},
  booktitle={Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum},
  editor={Guglielmo Faggioli and Nicola Ferro and Petra Galu{\v{s}}{\v{c}}{\'a}kov{\'a} and Alba Garc{\'i}a Seco de Herrera},
  year={2024},
  organization={CEUR-WS.org}
}
```

```
@inproceedings{DBLP:conf/ecir/BevendorffCCDEFFKMMPPRRSSSTUWZ24,
  author       = {Janek Bevendorff and
                  Xavier Bonet Casals and
                  Berta Chulvi and
                  Daryna Dementieva and
                  Ashaf Elnagar and
                  Dayne Freitag and
                  Maik Fr{\"{o}}be and
                  Damir Korencic and
                  Maximilian Mayerl and
                  Animesh Mukherjee and
                  Alexander Panchenko and
                  Martin Potthast and
                  Francisco Rangel and
                  Paolo Rosso and
                  Alisa Smirnova and
                  Efstathios Stamatatos and
                  Benno Stein and
                  Mariona Taul{\'{e}} and
                  Dmitry Ustalov and
                  Matti Wiegmann and
                  Eva Zangerle},
  editor       = {Nazli Goharian and
                  Nicola Tonellotto and
                  Yulan He and
                  Aldo Lipani and
                  Graham McDonald and
                  Craig Macdonald and
                  Iadh Ounis},
  title        = {Overview of {PAN} 2024: Multi-author Writing Style Analysis, Multilingual
                  Text Detoxification, Oppositional Thinking Analysis, and Generative
                  {AI} Authorship Verification - Extended Abstract},
  booktitle    = {Advances in Information Retrieval - 46th European Conference on Information
                  Retrieval, {ECIR} 2024, Glasgow, UK, March 24-28, 2024, Proceedings,
                  Part {VI}},
  series       = {Lecture Notes in Computer Science},
  volume       = {14613},
  pages        = {3--10},
  publisher    = {Springer},
  year         = {2024},
  url          = {https://doi.org/10.1007/978-3-031-56072-9\_1},
  doi          = {10.1007/978-3-031-56072-9\_1},
  timestamp    = {Fri, 29 Mar 2024 23:01:36 +0100},
  biburl       = {https://dblp.org/rec/conf/ecir/BevendorffCCDEFFKMMPPRRSSSTUWZ24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
```