Pablogps commited on
Commit
0e4ce0c
1 Parent(s): 6469d9a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -44
README.md CHANGED
@@ -155,7 +155,7 @@ Our final models were trained on a different number of steps and sequence length
155
 
156
  <figure>
157
 
158
- <caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta, seq len 128).</caption>
159
 
160
  | Dataset | Metric | RoBERTa-b | RoBERTa-l | BETO | mBERT | BERTIN |
161
  |-------------|----------|-----------|-----------|--------|--------|--------|
@@ -170,11 +170,11 @@ Our final models were trained on a different number of steps and sequence length
170
 
171
  </figure>
172
 
173
- All of our models attained good accuracy values, in the range of 0.65, as can be seen in Table 2:
174
 
175
  <figure>
176
 
177
- <caption>Table 2. Accuracy for the different language models.</caption>
178
 
179
  | Model | Accuracy |
180
  |----------------------------------------------------|----------|
@@ -187,6 +187,8 @@ All of our models attained good accuracy values, in the range of 0.65, as can be
187
 
188
  </figure>
189
 
 
 
190
  We are currently in the process of applying our language models to downstream tasks.
191
  For simplicity, we will abbreviate the different models as follows:
192
  * **BERT-m**: bert-base-multilingual-cased
@@ -202,55 +204,26 @@ For simplicity, we will abbreviate the different models as follows:
202
  <figure>
203
 
204
  <caption>
205
- Table 3. Metrics for different downstream tasks, comparing our different models as well as other relevant BERT variations from the literature. Dataset for POS nad NER is CoNLL 2002.
206
  </caption>
207
 
208
  | Model | POS (F1/Acc) | NER (F1/Acc) | PAWS-X (Acc) | XNLI-256 (Acc) | XNLI-512 (Acc) |
209
  |--------------|-------------------------|----------------------|--------------|--------------|--------------|
210
- | BERT-m | 0.9629 / 0.9687 | 0.8539 / 0.9779 | | | |
211
- | BERT-wwm | 0.9642 / 0.9700 | 0.8579 / 0.9783 | | | |
212
- | BSC-BNE | 0.9659 / 0.9707 | 0.8700 / 0.9807 | | | |
213
- | Beta | 0.9638 / 0.9690 | 0.8725 / 0.9812 | | | |
214
- | Random | 0.9656 / 0.9704 | 0.8704 / 0.9807 | | | |
215
- | Stepwise | 0.9656 / 0.9707 | 0.8705 / 0.9809 | | | |
216
- | Gaussian | 0.9662 / 0.9709 | **0.8792 / 0.9816** | | | |
217
- | Random-512 | 0.9660 / 0.9707 | 0.8616 / 0.9803 | | | |
218
- | Gaussian-512 | **0.9662 / 0.9714** | **0.8764 / 0.9819** | | | |
219
 
220
  </figure>
221
 
 
222
 
223
- ### SQUAD-es
224
- Using sequence length 128 we have achieved exact match 50.96 and F1 68.74.
225
-
226
-
227
- POS
228
- All models trained with max length 512 and batch size 8, using the CoNLL 2002 dataset.
229
-
230
- NER
231
- All models trained with max length 512 and batch size 8, using the CoNLL 2002 dataset.
232
-
233
- ## PAWS-X
234
- All models trained with max length 512 and batch size 8. These numbers are surprising both for the repeated instances of 0.5765 accuracy and for the large differences in performance. However, experiments have been repeated several times and the results are consistent.
235
-
236
- <figure>
237
-
238
- <caption>Table 5. Results for PAWS-X.</caption>
239
-
240
- | Model | Accuracy |
241
- |----------------------------------------------------|----------|
242
- | bert-base-multilingual-cased | 0.5765 |
243
- | dccuchile/bert-base-spanish-wwm-cased | 0.8720 |
244
- | BSC-TeMU/roberta-base-bne | 0.5765 |
245
- | bertin-project/bertin-roberta-base-spanish | 0.5765 |
246
- | bertin-project/bertin-base-random | 0.8800 |
247
- | bertin-project/bertin-base-stepwise | 0.8825 |
248
- | bertin-project/bertin-base-gaussian | 0.8875 |
249
- | bertin-project/bertin-base-random-exp-512seqlen | 0.6735 |
250
- | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.8965** |
251
-
252
- </figure>
253
-
254
 
255
  ### XNLI
256
 
155
 
156
  <figure>
157
 
158
+ <caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta, seq len 128), from their [preprint](https://arxiv.org/pdf/2107.07253.pdf).</caption>
159
 
160
  | Dataset | Metric | RoBERTa-b | RoBERTa-l | BETO | mBERT | BERTIN |
161
  |-------------|----------|-----------|-----------|--------|--------|--------|
170
 
171
  </figure>
172
 
173
+ All of our models attained good accuracy values during training in the masked-language model task—in the range of 0.65as can be seen in Table 2:
174
 
175
  <figure>
176
 
177
+ <caption>Table 2. Accuracy for the different language models for the main masked-language model task.</caption>
178
 
179
  | Model | Accuracy |
180
  |----------------------------------------------------|----------|
187
 
188
  </figure>
189
 
190
+ ###Downstream Tasks
191
+
192
  We are currently in the process of applying our language models to downstream tasks.
193
  For simplicity, we will abbreviate the different models as follows:
194
  * **BERT-m**: bert-base-multilingual-cased
204
  <figure>
205
 
206
  <caption>
207
+ Table 3. Metrics for different downstream tasks, comparing our different models as well as other relevant BERT variations from the literature. Dataset for POS nad NER is CoNLL 2002. POS, NER adn PAWS-X used max length 512 and batch size 8.
208
  </caption>
209
 
210
  | Model | POS (F1/Acc) | NER (F1/Acc) | PAWS-X (Acc) | XNLI-256 (Acc) | XNLI-512 (Acc) |
211
  |--------------|-------------------------|----------------------|--------------|--------------|--------------|
212
+ | BERT-m | 0.9629 / 0.9687 | 0.8539 / 0.9779 | 0.5765 | | |
213
+ | BERT-wwm | 0.9642 / 0.9700 | 0.8579 / 0.9783 | 0.8720 | | |
214
+ | BSC-BNE | 0.9659 / 0.9707 | 0.8700 / 0.9807 | 0.5765 | | |
215
+ | Beta | 0.9638 / 0.9690 | 0.8725 / 0.9812 | 0.5765 | | |
216
+ | Random | 0.9656 / 0.9704 | 0.8704 / 0.9807 | 0.8800 | | |
217
+ | Stepwise | 0.9656 / 0.9707 | 0.8705 / 0.9809 | 0.8825 | | |
218
+ | Gaussian | 0.9662 / 0.9709 | **0.8792 / 0.9816** | 0.8875 | | |
219
+ | Random-512 | 0.9660 / 0.9707 | 0.8616 / 0.9803 | 0.6735 | | |
220
+ | Gaussian-512 | **0.9662 / 0.9714** | **0.8764 / 0.9819** | **0.8965** | | |
221
 
222
  </figure>
223
 
224
+ In addition to the tasks above, we also trained the beta model on the SQUAD dataset, achieving exact match 50.96 and F1 68.74 (sequence length 128). A full evaluation of this task is still pending.
225
 
226
+ To note: not intense tuning, epochs, etc. Still, good?? PAWS-X: weird (large differences and repeated base value). Repeated and same, with minor differences.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
227
 
228
  ### XNLI
229