annieske commited on
Commit
0441055
1 Parent(s): 7f44339

Add cite information.

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -25,6 +25,30 @@ Training data: jigsaw_toxicity_pred_fi
25
 
26
  Eval data: jigsaw_toxicity_pred_fi
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ### Usage
29
 
30
  the model can be used through a huggingface pipeline:
 
25
 
26
  Eval data: jigsaw_toxicity_pred_fi
27
 
28
+
29
+ ### Citing
30
+
31
+ If you use this model please cite us using the following bibtex.
32
+
33
+ ```
34
+ @inproceedings{eskelinen-etal-2023-toxicity,
35
+ title = "Toxicity Detection in {F}innish Using Machine Translation",
36
+ author = "Eskelinen, Anni and
37
+ Silvala, Laura and
38
+ Ginter, Filip and
39
+ Pyysalo, Sampo and
40
+ Laippala, Veronika",
41
+ booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
42
+ month = may,
43
+ year = "2023",
44
+ address = "T{\'o}rshavn, Faroe Islands",
45
+ publisher = "University of Tartu Library",
46
+ url = "https://aclanthology.org/2023.nodalida-1.68",
47
+ pages = "685--697",
48
+ abstract = "Due to the popularity of social media platforms and the sheer amount of user-generated content online, the automatic detection of toxic language has become crucial in the creation of a friendly and safe digital space. Previous work has been mostly focusing on English leaving many lower-resource languages behind. In this paper, we present novel resources for toxicity detection in Finnish by introducing two new datasets, a machine translated toxicity dataset for Finnish based on the widely used English Jigsaw dataset and a smaller test set of Suomi24 discussion forum comments originally written in Finnish and manually annotated following the definitions of the labels that were used to annotate the Jigsaw dataset. We show that machine translating the training data to Finnish provides better toxicity detection results than using the original English training data and zero-shot cross-lingual transfer with XLM-R, even with our newly annotated dataset from Suomi24.",
49
+ }
50
+ ```
51
+
52
  ### Usage
53
 
54
  the model can be used through a huggingface pipeline: