gilramos commited on
Commit
aa35a5a
·
verified ·
1 Parent(s): 8761ba5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -2
README.md CHANGED
@@ -37,7 +37,7 @@ HateBERTimbau is a transformer-based encoder model for identifying hate speech i
37
 
38
  ## Training Data
39
 
40
- 229,103 tweets associated with offensive content were used to retrain the base model
41
 
42
  ## Training Hyperparameters
43
 
@@ -64,7 +64,18 @@ Twitter Test Set:
64
 
65
  ## BibTeX Citation
66
 
67
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
68
 
69
  ## Acknowledgements
70
 
 
37
 
38
  ## Training Data
39
 
40
+ 229,103 tweets associated with offensive content were used to retrain the base model.
41
 
42
  ## Training Hyperparameters
43
 
 
64
 
65
  ## BibTeX Citation
66
 
67
+ @mastersthesis{Matos-Automatic-Hate-Speech-Detection-in-Portuguese-Social-Media-Text,
68
+ title = {{Automatic Hate Speech Detection in Portuguese Social Media Text}},
69
+ author = {Matos, Bernardo Cunha},
70
+ month = nov,
71
+ year = {2022},
72
+ abstract = {{Online Hate Speech (HS) has been growing dramatically on social media and its uncontrolled spread has motivated researchers to develop a diversity of methods for its automated detection. However, the detection of online HS in Portuguese still merits further research. To fill this gap, we explored different models that proved to be successful in the literature to address this task. In particular, we have explored models that use the BERT architecture. Beyond testing single-task models we also explored multitask models that use the information on other related categories to learn HS. To better capture the semantics of this type of texts, we developed HateBERTimbau, a retrained version of BERTimbau more directed to social media language including potential HS targeting African descent, Roma, and LGBTQI+ communities. The performed experiments were based on CO-HATE and FIGHT, corpora of social media messages posted by the Portuguese online community that were labelled regarding the presence of HS among other categories.
73
+ The results achieved show the importance of considering the annotator's agreement on the data used to develop HS detection models. Comparing different subsets of data used for the training of the models it was shown that, in general, a higher agreement on the data leads to better results.
74
+ HATEBERTimbau consistently outperformed BERTimbau on both datasets confirming that further pre-training of BERTimbau was a successful strategy to obtain a language model more suitable for online HS detection in Portuguese.
75
+ The implementation of target-specific models, and multitask learning have shown potential in obtaining better results.}},
76
+ language = {eng},
77
+ copyright = {embargoed-access},
78
+ }
79
 
80
  ## Acknowledgements
81