FelipeGuerra
commited on
Commit
•
5161412
1
Parent(s):
175fcfa
Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,7 @@ This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://h
|
|
25 |
|
26 |
## Training and evaluation data
|
27 |
|
28 |
-
The dataset used
|
29 |
|
30 |
For instance, tweets in the not cyberbullying category predominantly contain obscene words that, in their particular context, do not correspond with cyberbullying. An example is “Marica, se me olvidó ver el partido”. Additionally, the not cyberbullying category, to a lesser extent, includes tweets sourced from trends in the Colombian region. Twitter trends reflect the most popular topics and conversations in a given area at a specific time, essentially capturing what people are discussing and sharing online in that geographical locale.
|
31 |
|
|
|
25 |
|
26 |
## Training and evaluation data
|
27 |
|
28 |
+
[The dataset used consisted of 3570 tweets](https://huggingface.co/datasets/FelipeGuerra/Colombian_Spanish_Cyberbullying_Dataset_1), which were manually labeled as cyberbullying or not cyberbullying. A distinguishing feature of this dataset is that for a given word, there is an annotated tweet labeled as cyberbullying that contains that word, and another tweet labeled as not cyberbullying with the same word. This is made possible because the context in which the same word is used can vary, leading to tweets being classified differently.
|
29 |
|
30 |
For instance, tweets in the not cyberbullying category predominantly contain obscene words that, in their particular context, do not correspond with cyberbullying. An example is “Marica, se me olvidó ver el partido”. Additionally, the not cyberbullying category, to a lesser extent, includes tweets sourced from trends in the Colombian region. Twitter trends reflect the most popular topics and conversations in a given area at a specific time, essentially capturing what people are discussing and sharing online in that geographical locale.
|
31 |
|