FelipeGuerra
/

colombian-spanish-cyberbullying-classifier

Text Classification

Inference Endpoints

Model card Files Files and versions Community

FelipeGuerra commited on Dec 12, 2023

Commit

5161412

·

1 Parent(s): 175fcfa

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://h
 ## Training and evaluation data
-The dataset used was a small one, consisting of 3570 tweets, which were manually labeled as cyberbullying or not cyberbullying. A distinguishing feature of this dataset is that for a given word, there is an annotated tweet labeled as cyberbullying that contains that word, and another tweet labeled as not cyberbullying with the same word. This is made possible because the context in which the same word is used can vary, leading to tweets being classified differently.
 For instance, tweets in the not cyberbullying category predominantly contain obscene words that, in their particular context, do not correspond with cyberbullying. An example is “Marica, se me olvidó ver el partido”. Additionally, the not cyberbullying category, to a lesser extent, includes tweets sourced from trends in the Colombian region. Twitter trends reflect the most popular topics and conversations in a given area at a specific time, essentially capturing what people are discussing and sharing online in that geographical locale.

 ## Training and evaluation data
+[The dataset used consisted of 3570 tweets](https://huggingface.co/datasets/FelipeGuerra/Colombian_Spanish_Cyberbullying_Dataset_1), which were manually labeled as cyberbullying or not cyberbullying. A distinguishing feature of this dataset is that for a given word, there is an annotated tweet labeled as cyberbullying that contains that word, and another tweet labeled as not cyberbullying with the same word. This is made possible because the context in which the same word is used can vary, leading to tweets being classified differently.
 For instance, tweets in the not cyberbullying category predominantly contain obscene words that, in their particular context, do not correspond with cyberbullying. An example is “Marica, se me olvidó ver el partido”. Additionally, the not cyberbullying category, to a lesser extent, includes tweets sourced from trends in the Colombian region. Twitter trends reflect the most popular topics and conversations in a given area at a specific time, essentially capturing what people are discussing and sharing online in that geographical locale.