Pablogps commited on
Commit
3bf5e63
·
1 Parent(s): f53cd48

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -6
README.md CHANGED
@@ -135,7 +135,7 @@ For `Gaussian` sampling we started a new optimizer after 230 steps with 128 seq
135
 
136
  ## Results
137
 
138
- Our first test, tagged `beta` in this repository, refers to an initial experiment using `Stepwise` on 128 sequence length and trained for 210k steps. Two nearly identical versions of this model can be found, one at **bertin-project** and the other at **flax-community/bertin-roberta-large-spanish** (do note this is **not our best model**!). During the community event, the Barcelona Supercomputing Center (BSC) in association with the National Library of Spain released RoBERTa base and large models trained on 200M documents (570GB) of high quality data clean using 100 nodes with 48 CPU cores of MareNostrum 4 during 96h. At the end of the process they were left with 2TB of clean data at the document level that were further cleaned up to the final 570GB. This is an interesting contrast to our own resources (3xTPUv3-8 for 10 days to do cleaning, sampling, taining, and evaluation) and makes for a valuable reference. The BSC team evaluated our early release of the model `beta` and the results can be seen in Table 1.
139
 
140
  Our final models were trained on a different number of steps and sequence lengths and achieve different—higher—masked-word prediction accuracies. Despite these limitations it is interesting to see the results they obtained using the early version of our model. Note that some of the datasets used for evaluation by BSC are not freely available, therefore it is not possible to verify the figures.
141
 
@@ -162,7 +162,6 @@ All of our models attained good accuracy values, in the range of 0.65, as can be
162
 
163
  | Model || Accuracy |
164
  |----------------------------------------------------|----------|
165
- | flax-community/bertin-roberta-large-spanish | 0.6537 |
166
  | bertin-project/bertin-roberta-base-spanish | 0.6547 |
167
  | bertin-project/bertin-base-random | 0.6520 |
168
  | bertin-project/bertin-base-stepwise | 0.6487 |
@@ -189,7 +188,6 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
189
  | bert-base-multilingual-cased | 0.9629 | 0.9687 |
190
  | dccuchile/bert-base-spanish-wwm-cased | 0.9642 | 0.9700 |
191
  | BSC-TeMU/roberta-base-bne | 0.9659 | 0.9707 |
192
- | flax-community/bertin-roberta-large-spanish | 0.9646 | 0.9697 |
193
  | bertin-project/bertin-roberta-base-spanish | 0.9638 | 0.9690 |
194
  | bertin-project/bertin-base-random | 0.9656 | 0.9704 |
195
  | bertin-project/bertin-base-stepwise | 0.9656 | 0.9707 |
@@ -212,7 +210,6 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
212
  | bert-base-multilingual-cased | 0.8539 | 0.9779 |
213
  | dccuchile/bert-base-spanish-wwm-cased | 0.8579 | 0.9783 |
214
  | BSC-TeMU/roberta-base-bne | 0.8700 | 0.9807 |
215
- | flax-community/bertin-roberta-large-spanish | 0.8735 | 0.9806 |
216
  | bertin-project/bertin-roberta-base-spanish | 0.8725 | 0.9812 |
217
  | bertin-project/bertin-base-random | 0.8704 | 0.9807 |
218
  | bertin-project/bertin-base-stepwise | 0.8705 | 0.9809 |
@@ -235,7 +232,6 @@ All models trained with max length 512 and batch size 8. The accuracy values in
235
  | bert-base-multilingual-cased | 0.5765 |
236
  | dccuchile/bert-base-spanish-wwm-cased | 0.5765 |
237
  | BSC-TeMU/roberta-base-bne | 0.5765 |
238
- | flax-community/bertin-roberta-large-spanish | 0.5765 |
239
  | bertin-project/bertin-roberta-base-spanish | 0.6550 |
240
  | bertin-project/bertin-base-random | 0.8665 |
241
  | bertin-project/bertin-base-stepwise | 0.8610 |
@@ -257,7 +253,6 @@ All models trained with max length 256 and batch size 16.
257
  | bert-base-multilingual-cased | WIP |
258
  | dccuchile/bert-base-spanish-wwm-cased | WIP |
259
  | BSC-TeMU/roberta-base-bne | WIP |
260
- | flax-community/bertin-roberta-large-spanish | WIP |
261
  | bertin-project/bertin-roberta-base-spanish | WIP |
262
  | bertin-project/bertin-base-random | 0.7745 |
263
  | bertin-project/bertin-base-stepwise | 0.7820 |
 
135
 
136
  ## Results
137
 
138
+ Our first test, tagged `beta` in this repository, refers to an initial experiment using `Stepwise` on 128 sequence length and trained for 210k steps. Two nearly identical versions of this model can be found, one at **bertin-roberta-base-spanish** and the other at **flax-community/bertin-roberta-large-spanish** (do note this is **not our best model**!). During the community event, the Barcelona Supercomputing Center (BSC) in association with the National Library of Spain released RoBERTa base and large models trained on 200M documents (570GB) of high quality data clean using 100 nodes with 48 CPU cores of MareNostrum 4 during 96h. At the end of the process they were left with 2TB of clean data at the document level that were further cleaned up to the final 570GB. This is an interesting contrast to our own resources (3xTPUv3-8 for 10 days to do cleaning, sampling, taining, and evaluation) and makes for a valuable reference. The BSC team evaluated our early release of the model `beta` and the results can be seen in Table 1.
139
 
140
  Our final models were trained on a different number of steps and sequence lengths and achieve different—higher—masked-word prediction accuracies. Despite these limitations it is interesting to see the results they obtained using the early version of our model. Note that some of the datasets used for evaluation by BSC are not freely available, therefore it is not possible to verify the figures.
141
 
 
162
 
163
  | Model || Accuracy |
164
  |----------------------------------------------------|----------|
 
165
  | bertin-project/bertin-roberta-base-spanish | 0.6547 |
166
  | bertin-project/bertin-base-random | 0.6520 |
167
  | bertin-project/bertin-base-stepwise | 0.6487 |
 
188
  | bert-base-multilingual-cased | 0.9629 | 0.9687 |
189
  | dccuchile/bert-base-spanish-wwm-cased | 0.9642 | 0.9700 |
190
  | BSC-TeMU/roberta-base-bne | 0.9659 | 0.9707 |
 
191
  | bertin-project/bertin-roberta-base-spanish | 0.9638 | 0.9690 |
192
  | bertin-project/bertin-base-random | 0.9656 | 0.9704 |
193
  | bertin-project/bertin-base-stepwise | 0.9656 | 0.9707 |
 
210
  | bert-base-multilingual-cased | 0.8539 | 0.9779 |
211
  | dccuchile/bert-base-spanish-wwm-cased | 0.8579 | 0.9783 |
212
  | BSC-TeMU/roberta-base-bne | 0.8700 | 0.9807 |
 
213
  | bertin-project/bertin-roberta-base-spanish | 0.8725 | 0.9812 |
214
  | bertin-project/bertin-base-random | 0.8704 | 0.9807 |
215
  | bertin-project/bertin-base-stepwise | 0.8705 | 0.9809 |
 
232
  | bert-base-multilingual-cased | 0.5765 |
233
  | dccuchile/bert-base-spanish-wwm-cased | 0.5765 |
234
  | BSC-TeMU/roberta-base-bne | 0.5765 |
 
235
  | bertin-project/bertin-roberta-base-spanish | 0.6550 |
236
  | bertin-project/bertin-base-random | 0.8665 |
237
  | bertin-project/bertin-base-stepwise | 0.8610 |
 
253
  | bert-base-multilingual-cased | WIP |
254
  | dccuchile/bert-base-spanish-wwm-cased | WIP |
255
  | BSC-TeMU/roberta-base-bne | WIP |
 
256
  | bertin-project/bertin-roberta-base-spanish | WIP |
257
  | bertin-project/bertin-base-random | 0.7745 |
258
  | bertin-project/bertin-base-stepwise | 0.7820 |