luisespinosa commited on
Commit
9ad0c1d
1 Parent(s): ab65fea

update readme

Browse files
Files changed (1) hide show
  1. README.md +4 -168
README.md CHANGED
@@ -1,172 +1,8 @@
1
- # TweetEval
2
- This is the repository for the [_TweetEval_ benchmark (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). _TweetEval_ consists of seven heterogenous tasks in Twitter, all framed as multi-class tweet classification. All tasks have been unified into the same benchmark, with each dataset presented in the same format and with fixed training, validation and test splits.
3
 
4
- # TweetEval: The Benchmark
5
 
6
- These are the seven datasets of TweetEval, with its corresponding labels (more details about the format in the [datasets](https://github.com/cardiffnlp/tweeteval/tree/main/datasets) directory):
7
 
8
- - **Emotion Recognition**: [SemEval 2018 (Emotion Recognition)](https://www.aclweb.org/anthology/S18-1001/) - 4 labels: `anger`, `joy`,`sadness`, `optimism`
9
 
10
- - **Emoji Prediction**, [SemEval 2018 (Emoji Prediction)](https://www.aclweb.org/anthology/S18-1003.pdf) - 20 labels: :heart:, :heart_eyes:, :joy: `...` :evergreen_tree:, :camera:, :stuck_out_tongue_winking_eye:
11
-
12
- - **Irony Detection**, [SemEval 2018 (Irony Detection)](https://www.aclweb.org/anthology/S18-1005.pdf) - 2 labels: `irony`, `not irony`
13
-
14
- - **Hate Speech Detection**, [SemEval 2019 (Hateval)](https://www.aclweb.org/anthology/S19-2007.pdf) - 2 labels: `hateful`, `not hateful`
15
-
16
- - **Offensive Language Identification**, [SemEval 2019 (OffensEval)](https://www.aclweb.org/anthology/S19-2010/) - 2 labels: `offensive`, `not offensive`
17
-
18
- - **Sentiment Analysis**, [SemEval 2017 (Sentiment Analysis in Twitter)](https://www.aclweb.org/anthology/S17-2088/) - 3 labels: `positive`, `neutral`, `negative`
19
-
20
- - **Stance Detection***, [SemEval 2016 (Detecting Stance in Tweets)](https://www.aclweb.org/anthology/S16-1003/) - 3 labels: `favour`, `neutral`, `against`
21
-
22
- **Note***: For stance there are five different target topics (Abortion, Atheism, Climate change, Feminism and Hillary Clinton), each of which contains its own training, validation and test data.
23
-
24
- # TweetEval: Leaderboard (Test set)
25
-
26
- | Model | Emoji | Emotion | Hate | Irony | Offensive | Sentiment | Stance | ALL(TE) | Reference |
27
- |----------|------:|--------:|-----:|------:|----------:|----------:|-------:|----:|---------|
28
- | RoBERTa-Retrained | 31.4 | 78.5 | 52.3 | 61.7 | 80.5 | 72.6 | 69.3 | **65.2** | [TweetEval](https://arxiv.org/pdf/2010.12421.pdf) |
29
- | RoBERTa-Base | 30.9 | 76.1 | 46.6 | 59.7 | 79.5 | 71.3 | 68 | 61.3 | [TweetEval](https://arxiv.org/pdf/2010.12421.pdf) |
30
- | RoBERTa-Twitter | 29.3 | 72.0 | 49.9 | 65.4 | 77.1 | 69.1 | 66.7 | 61.0 | [TweetEval](https://arxiv.org/pdf/2010.12421.pdf) |
31
- | FastText | 25.8 | 65.2 | 50.6 | 63.1 | 73.4 | 62.9 | 65.4 | 58.1 | [TweetEval](https://arxiv.org/pdf/2010.12421.pdf) |
32
- | LSTM | 24.7 | 66.0 | 52.6 | 62.8 | 71.7 | 58.3 | 59.4 | 56.5 | [TweetEval](https://arxiv.org/pdf/2010.12421.pdf) |
33
- | SVM | 29.3 | 64.7 | 36.7 | 61.7 | 52.3 | 62.9 | 67.3 | 53.5 | [TweetEval](https://arxiv.org/pdf/2010.12421.pdf) |
34
-
35
- **Note***: Check the [reference paper](https://arxiv.org/pdf/2010.12421.pdf) for details on the official metrics for each task
36
-
37
- If you would like to have your results added to the leaderboard you can either submit a pull request or send an email to any of the paper authors with results and the predictions of your model. Please also submit a reference to a paper describing your approach.
38
-
39
- # Evaluating your system
40
-
41
- For evaluating your system, you simply need a individual predictions file for each of the tasks. The format of the predictions file should be the same as the output examples in the predictions folder (one output label per line as per the original test file). The predictions included as an example correspond to the best model evaluated in the paper, i.e., RoBERTa re-trained on Twitter (RoB-Rt in the paper).
42
-
43
- ### Example usage
44
-
45
- ```bash
46
- python evaluation_script.py
47
- ```
48
- The script takes the TweetEval gold test labels and the predictions from the "predictions" folder by default, but you can set this to suit your needs as optional arguments.
49
-
50
- ### Optional arguments
51
-
52
- Three optional arguments can be modified:
53
-
54
- *--tweeteval_path*: Path to TweetEval datasets. Default: *"./datasets/"*
55
-
56
- *--predictions_path*: Path to predictions directory. Default: *"./predictions/"*
57
-
58
- *--task*: Use this to get single task detailed results *(emoji|emotion|hate|irony|offensive|sentiment|stance)*. Default: ""
59
-
60
- Evaluation script sample usage from the terminal with parameters:
61
-
62
- ```bash
63
- python evaluation_script.py --tweeteval_path ./datasets/ --predictions_path ./predictions/ --task emoji
64
- ```
65
- (this script would output the breakdown of the results for the emoji prediction task only)
66
-
67
- # Pre-trained models and code
68
-
69
- Coming soon! Here we will release all the Twitter-trained RoBERTa models included in the paper and code to evaluate any pre-trained language model in _TweetEval_.
70
-
71
- # Citing TweetEval
72
-
73
- If you use TweetEval in your research, please use the following `bib` entry to cite the [reference paper](https://arxiv.org/pdf/2010.12421.pdf).
74
-
75
- ```
76
- @inproceedings{barbieri2020tweeteval,
77
- title={{TweetEval:Unified Benchmark and Comparative Evaluation for Tweet Classification}},
78
- author={Barbieri, Francesco and Camacho-Collados, Jose and Espinosa-Anke, Luis and Neves, Leonardo},
79
- booktitle={Proceedings of Findings of EMNLP},
80
- year={2020}
81
- }
82
- ```
83
- # License
84
-
85
- TweetEval is released without any restrictions but restrictions may apply to individual tasks (which are derived from existing datasets) or Twitter (main data source). We refer users to the original licenses accompanying each dataset and Twitter regulations.
86
-
87
-
88
- # Citing TweetEval datasets
89
-
90
- If you use any of the TweetEval datasets, please cite their original publications:
91
-
92
- #### Emotion Recognition:
93
- ```
94
- @inproceedings{mohammad2018semeval,
95
- title={Semeval-2018 task 1: Affect in tweets},
96
- author={Mohammad, Saif and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana},
97
- booktitle={Proceedings of the 12th international workshop on semantic evaluation},
98
- pages={1--17},
99
- year={2018}
100
- }
101
-
102
- ```
103
- #### Emoji Prediction:
104
- ```
105
- @inproceedings{barbieri2018semeval,
106
- title={Semeval 2018 task 2: Multilingual emoji prediction},
107
- author={Barbieri, Francesco and Camacho-Collados, Jose and Ronzano, Francesco and Espinosa-Anke, Luis and
108
- Ballesteros, Miguel and Basile, Valerio and Patti, Viviana and Saggion, Horacio},
109
- booktitle={Proceedings of The 12th International Workshop on Semantic Evaluation},
110
- pages={24--33},
111
- year={2018}
112
- }
113
- ```
114
-
115
- #### Irony Detection:
116
- ```
117
- @inproceedings{van2018semeval,
118
- title={Semeval-2018 task 3: Irony detection in english tweets},
119
- author={Van Hee, Cynthia and Lefever, Els and Hoste, V{\'e}ronique},
120
- booktitle={Proceedings of The 12th International Workshop on Semantic Evaluation},
121
- pages={39--50},
122
- year={2018}
123
- }
124
- ```
125
-
126
- #### Hate Speech Detection:
127
- ```
128
- @inproceedings{basile-etal-2019-semeval,
129
- title = "{S}em{E}val-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in {T}witter",
130
- author = "Basile, Valerio and Bosco, Cristina and Fersini, Elisabetta and Nozza, Debora and Patti, Viviana and
131
- Rangel Pardo, Francisco Manuel and Rosso, Paolo and Sanguinetti, Manuela",
132
- booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation",
133
- year = "2019",
134
- address = "Minneapolis, Minnesota, USA",
135
- publisher = "Association for Computational Linguistics",
136
- url = "https://www.aclweb.org/anthology/S19-2007",
137
- doi = "10.18653/v1/S19-2007",
138
- pages = "54--63"
139
- }
140
- ```
141
- #### Offensive Language Identification:
142
- ```
143
- @inproceedings{zampieri2019semeval,
144
- title={SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)},
145
- author={Zampieri, Marcos and Malmasi, Shervin and Nakov, Preslav and Rosenthal, Sara and Farra, Noura and Kumar, Ritesh},
146
- booktitle={Proceedings of the 13th International Workshop on Semantic Evaluation},
147
- pages={75--86},
148
- year={2019}
149
- }
150
- ```
151
-
152
- #### Sentiment Analysis:
153
- ```
154
- @inproceedings{rosenthal2017semeval,
155
- title={SemEval-2017 task 4: Sentiment analysis in Twitter},
156
- author={Rosenthal, Sara and Farra, Noura and Nakov, Preslav},
157
- booktitle={Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017)},
158
- pages={502--518},
159
- year={2017}
160
- }
161
- ```
162
-
163
- #### Stance Detection:
164
- ```
165
- @inproceedings{mohammad2016semeval,
166
- title={Semeval-2016 task 6: Detecting stance in tweets},
167
- author={Mohammad, Saif and Kiritchenko, Svetlana and Sobhani, Parinaz and Zhu, Xiaodan and Cherry, Colin},
168
- booktitle={Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)},
169
- pages={31--41},
170
- year={2016}
171
- }
172
- ```
 
1
+ # Twitter-roBERTa-base
 
2
 
3
+ This is a roBERTa-base model trained on ~58M tweets, described and evaluated in the [_TweetEval_ benchmark (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). To evaluate this and other LMs on Twitter-specific data, please refer to the [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval).
4
 
 
5
 
6
+ ## Ejemplo MLM
7
 
8
+ blabla