Edit model card

Content Warning: This card may contain examples of offensive language that do not reflect the authors’ views

Model Card for mT5-counternarrative-en

This is a fine-tuned text-to-text mT5-base model to generate counternarratives against hate speech. The model has been fine-tuned on the CONAN-EUS splits of the original CONAN dataset.

The CONAN (COunter NArratives through Nichesourcing) dataset was published by Chung et al., 2019 and is publicly available in https://github.com/marcoguerini/CONAN.

CONAN-EUS was created by professionally translating all 6654 English HS-CN pairs of the original CONAN dataset into Basque and Spanish. For experimentation we generated train, validation and test splits in a way that no HS-CN pairs occurred across them.

CONAN-EUS Splits Total HS-CN Count
train 4833
validation 537
test 1278

HS-CN example

The CONAN dataset includes HS-CN pairs, regarding the topic of Muslims and Islam. An example of a HS-CN pair in Basque, Spanish and English is illustrated below:

HS CN
Musulmanok ez dugu gure kultura aberastu dezakeen ezer erabilgarririk. Zer egin dute guretzat musulmanek? Beno, kafea, erlojuak, kamerak, higienea, unibertsitateak, tresna kirurgikoak, mapak, musika, aljebra.
Los musulmanes no tienen nada útil que pueda enriquecer nuestra cultura. ¿Qué han hecho los musulmanes por nosotros? Bueno, café, relojes, cámaras, higiene, universidades, instrumentos quirúrgicos, mapas, música, álgebra.
Muslims do not have anything useful that can enrich our culture. What have Muslims ever done for us? Well, Coffee, Clocks, Cameras, Hygiene, Universities, Surgical Instruments, Maps, Music, Algebra.

If you use the model please cite these two papers:

Citation

@inproceedings{bengoetxea-et-al-2024,
      title={{B}asque and {S}panish {C}ounter {N}arrative {G}eneration: {D}ata {C}reation and {E}valuation},
      author={Jaione Bengoetxea and Yi-Ling Chung and Marco Guerini and Rodrigo Agerri},
      year={2024},
      publisher = "Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)",
}
@inproceedings{chung-etal-2019-conan,
    title = "{CONAN} - {CO}unter {NA}rratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech",
    author = "Chung, Yi-Ling  and
      Kuzmenko, Elizaveta  and
      Tekiroglu, Serra Sinem  and
      Guerini, Marco",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    year = "2019",
    pages = "2819--2829"
}

Contact: Rodrigo Agerri HiTZ Center - Ixa, University of the Basque Country UPV/EHU

Downloads last month
5

Dataset used to train HiTZ/mt5-counter-narrative-en

Collection including HiTZ/mt5-counter-narrative-en