Pablogps commited on
Commit
96e881d
1 Parent(s): 5de6912

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -15
README.md CHANGED
@@ -148,6 +148,7 @@ Our final models were trained on a different number of steps and sequence length
148
 
149
  <figure>
150
 
 
151
  | Dataset | Metric | RoBERTa-b | RoBERTa-l | BETO | mBERT | BERTIN |
152
  |-------------|----------|-----------|-----------|--------|--------|--------|
153
  | UD-POS | F1 | **0.9907** | 0.9901 | 0.9900 | 0.9886 | **0.9904** |
@@ -159,14 +160,13 @@ Our final models were trained on a different number of steps and sequence length
159
  | PAWS-X | F1 | 0.9035 | 0.9000 | 0.8915 | 0.9020 | 0.8820 |
160
  | XNLI | Accuracy | 0.8016 | WiP | 0.8130 | 0.7876 | WiP |
161
 
162
-
163
- <caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta, seq len 128).</caption>
164
  </figure>
165
 
166
  All of our models attained good accuracy values, in the range of 0.65, as can be seen in Table 2:
167
 
168
  <figure>
169
 
 
170
  | Model | Accuracy |
171
  |----------------------------------------------------|----------|
172
  | bertin-project/bertin-roberta-base-spanish | 0.6547 |
@@ -176,8 +176,6 @@ All of our models attained good accuracy values, in the range of 0.65, as can be
176
  | bertin-project/bertin-base-random-exp-512seqlen | 0.5907 |
177
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.6873** |
178
 
179
-
180
- <caption>Table 2. Accuracy for the different language models.</caption>
181
  </figure>
182
 
183
  We are currently in the process of applying our language models to downstream tasks.
@@ -192,6 +190,7 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
192
 
193
  <figure>
194
 
 
195
  | Model | F1 | Accuracy |
196
  |----------------------------------------------------|----------|----------|
197
  | bert-base-multilingual-cased | 0.9629 | 0.9687 |
@@ -204,8 +203,6 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
204
  | bertin-project/bertin-base-random-exp-512seqlen | 0.9660 | 0.9707 |
205
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.9662** | **0.9714** |
206
 
207
-
208
- <caption>Table 3. Results for POS.</caption>
209
  </figure>
210
 
211
 
@@ -214,6 +211,7 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
214
 
215
  <figure>
216
 
 
217
  | Model | F1 | Accuracy |
218
  |----------------------------------------------------|----------|----------|
219
  | bert-base-multilingual-cased | 0.8539 | 0.9779 |
@@ -226,8 +224,6 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
226
  | bertin-project/bertin-base-random-exp-512seqlen | 0.8616 | 0.9803 |
227
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.8764** | **0.9819** |
228
 
229
-
230
- <caption>Table 4. Results for NER.</caption>
231
  </figure>
232
 
233
 
@@ -236,6 +232,7 @@ All models trained with max length 512 and batch size 8. These numbers are surpr
236
 
237
  <figure>
238
 
 
239
  | Model | Accuracy |
240
  |----------------------------------------------------|----------|
241
  | bert-base-multilingual-cased | 0.5765 |
@@ -248,8 +245,6 @@ All models trained with max length 512 and batch size 8. These numbers are surpr
248
  | bertin-project/bertin-base-random-exp-512seqlen | 0.6735 |
249
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.8965** |
250
 
251
-
252
- <caption>Table 5. Results for PAWS-X.</caption>
253
  </figure>
254
 
255
 
@@ -257,6 +252,7 @@ All models trained with max length 512 and batch size 8. These numbers are surpr
257
 
258
  <figure>
259
 
 
260
  | Model | Accuracy |
261
  |----------------------------------------------------|----------|
262
  | bert-base-multilingual-cased | 0.7852 |
@@ -268,13 +264,14 @@ All models trained with max length 512 and batch size 8. These numbers are surpr
268
  | bertin-project/bertin-base-random-exp-512seqlen | 0.7723 |
269
  | bertin-project/bertin-base-gaussian-exp-512seqlen | 0.7878 |
270
 
271
-
272
- <caption>Table 6. Results for XNLI with sequence length 256 and batch size 32.</caption>
273
  </figure>
274
 
275
 
276
  <figure>
277
 
 
 
 
278
  | Model | Accuracy |
279
  |----------------------------------------------------|----------|
280
  | bert-base-multilingual-cased | WIP |
@@ -287,9 +284,6 @@ All models trained with max length 512 and batch size 8. These numbers are surpr
287
  | bertin-project/bertin-base-gaussian-exp-512seqlen | 0.7843 |
288
 
289
 
290
- <caption>Table 7. Results for XNLI with sequence length 512 and batch size 16.</caption>
291
- </figure>
292
-
293
  # Conclusions
294
 
295
  With roughly 10 days worth of access to 3xTPUv3-8, we have achieved remarkable results surpassing previous state of the art in a few tasks, and even improving document classification on models trained in massive supercomputers with very large—private—and highly curated datasets.
148
 
149
  <figure>
150
 
151
+ <caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta, seq len 128).</caption>
152
  | Dataset | Metric | RoBERTa-b | RoBERTa-l | BETO | mBERT | BERTIN |
153
  |-------------|----------|-----------|-----------|--------|--------|--------|
154
  | UD-POS | F1 | **0.9907** | 0.9901 | 0.9900 | 0.9886 | **0.9904** |
160
  | PAWS-X | F1 | 0.9035 | 0.9000 | 0.8915 | 0.9020 | 0.8820 |
161
  | XNLI | Accuracy | 0.8016 | WiP | 0.8130 | 0.7876 | WiP |
162
 
 
 
163
  </figure>
164
 
165
  All of our models attained good accuracy values, in the range of 0.65, as can be seen in Table 2:
166
 
167
  <figure>
168
 
169
+ <caption>Table 2. Accuracy for the different language models.</caption>
170
  | Model | Accuracy |
171
  |----------------------------------------------------|----------|
172
  | bertin-project/bertin-roberta-base-spanish | 0.6547 |
176
  | bertin-project/bertin-base-random-exp-512seqlen | 0.5907 |
177
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.6873** |
178
 
 
 
179
  </figure>
180
 
181
  We are currently in the process of applying our language models to downstream tasks.
190
 
191
  <figure>
192
 
193
+ <caption>Table 3. Results for POS.</caption>
194
  | Model | F1 | Accuracy |
195
  |----------------------------------------------------|----------|----------|
196
  | bert-base-multilingual-cased | 0.9629 | 0.9687 |
203
  | bertin-project/bertin-base-random-exp-512seqlen | 0.9660 | 0.9707 |
204
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.9662** | **0.9714** |
205
 
 
 
206
  </figure>
207
 
208
 
211
 
212
  <figure>
213
 
214
+ <caption>Table 4. Results for NER.</caption>
215
  | Model | F1 | Accuracy |
216
  |----------------------------------------------------|----------|----------|
217
  | bert-base-multilingual-cased | 0.8539 | 0.9779 |
224
  | bertin-project/bertin-base-random-exp-512seqlen | 0.8616 | 0.9803 |
225
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.8764** | **0.9819** |
226
 
 
 
227
  </figure>
228
 
229
 
232
 
233
  <figure>
234
 
235
+ <caption>Table 5. Results for PAWS-X.</caption>
236
  | Model | Accuracy |
237
  |----------------------------------------------------|----------|
238
  | bert-base-multilingual-cased | 0.5765 |
245
  | bertin-project/bertin-base-random-exp-512seqlen | 0.6735 |
246
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.8965** |
247
 
 
 
248
  </figure>
249
 
250
 
252
 
253
  <figure>
254
 
255
+ <caption>Table 6. Results for XNLI with sequence length 256 and batch size 32.</caption>
256
  | Model | Accuracy |
257
  |----------------------------------------------------|----------|
258
  | bert-base-multilingual-cased | 0.7852 |
264
  | bertin-project/bertin-base-random-exp-512seqlen | 0.7723 |
265
  | bertin-project/bertin-base-gaussian-exp-512seqlen | 0.7878 |
266
 
 
 
267
  </figure>
268
 
269
 
270
  <figure>
271
 
272
+ <caption>Table 7. Results for XNLI with sequence length 512 and batch size 16.</caption>
273
+ </figure>
274
+
275
  | Model | Accuracy |
276
  |----------------------------------------------------|----------|
277
  | bert-base-multilingual-cased | WIP |
284
  | bertin-project/bertin-base-gaussian-exp-512seqlen | 0.7843 |
285
 
286
 
 
 
 
287
  # Conclusions
288
 
289
  With roughly 10 days worth of access to 3xTPUv3-8, we have achieved remarkable results surpassing previous state of the art in a few tasks, and even improving document classification on models trained in massive supercomputers with very large—private—and highly curated datasets.