Pablogps commited on
Commit
0731b1a
1 Parent(s): 30f98d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -7
README.md CHANGED
@@ -120,7 +120,7 @@ for split in ("random", "stepwise", "gaussian"):
120
 
121
  We then used the same setup as Liu et al. (2019) but trained only for half the steps (250k) on a sequence length of 128. In particular, `Gaussian` trained for the 250k steps, while `Random` was stopped at 230k and `Stepwise` at 180k (this was a decision based on an analysis of training performance and the computational resources available at the time).
122
 
123
- Then, we continued training the most promising model for a few steps (~25k) more on sequence length 512. We tried two strategies for this, since it is not easy to find clear details about this change in the literature.
124
 
125
  For `Random` sampling we trained with seq len 512 during the last 20 steps of the 250 training steps, keeping the optimizer state intact. Results for this are underwhelming, as seen in Figure 7:
126
 
@@ -131,7 +131,7 @@ For `Random` sampling we trained with seq len 512 during the last 20 steps of th
131
  <caption>Figure 7. Training profile for Random sampling. Note the drop in performance after the change from 128 to 512 sequence lenght.</caption>
132
  </figure>
133
 
134
- For `Gaussian` sampling we started a new optimizer after 230 steps with 128 seq len, using a short warmup interval. Results are much better (we do not have a graph since training needed to be restarted several times).
135
 
136
  ## Results
137
 
@@ -153,7 +153,25 @@ Our final models were trained on a different number of steps and sequence length
153
  | XNLI | Accuracy | 0.8016 | WiP | 0.8130 | 0.7876 | WiP |
154
 
155
 
156
- <caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta).</caption>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
  </figure>
158
 
159
  We are currently in the process of applying our language models to downstream tasks.
@@ -180,7 +198,7 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
180
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.9662** | **0.9714** |
181
 
182
 
183
- <caption>Table 2. Results for POS.</caption>
184
  </figure>
185
 
186
 
@@ -203,7 +221,7 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
203
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.8764** | **0.9819** |
204
 
205
 
206
- <caption>Table 3. Results for NER.</caption>
207
  </figure>
208
 
209
 
@@ -226,7 +244,7 @@ All models trained with max length 512 and batch size 8. The accuracy values in
226
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.875** |
227
 
228
 
229
- <caption>Table 4. Results for PAWS-X.</caption>
230
  </figure>
231
 
232
  **CNLI**
@@ -248,7 +266,7 @@ All models trained with max length 256 and batch size 16.
248
  | bertin-project/bertin-base-gaussian-exp-512seqlen | 0.7878 |
249
 
250
 
251
- <caption>Table 5. Results for CNLI.</caption>
252
  </figure>
253
 
254
  # Conclusions
120
 
121
  We then used the same setup as Liu et al. (2019) but trained only for half the steps (250k) on a sequence length of 128. In particular, `Gaussian` trained for the 250k steps, while `Random` was stopped at 230k and `Stepwise` at 180k (this was a decision based on an analysis of training performance and the computational resources available at the time).
122
 
123
+ Then, we continued training the most promising model for a few steps (~25k) more on sequence length 512. We tried two strategies for this, since it is not easy to find clear details about this change in the literature. It turns out this decision had a big impact in the final performance.
124
 
125
  For `Random` sampling we trained with seq len 512 during the last 20 steps of the 250 training steps, keeping the optimizer state intact. Results for this are underwhelming, as seen in Figure 7:
126
 
131
  <caption>Figure 7. Training profile for Random sampling. Note the drop in performance after the change from 128 to 512 sequence lenght.</caption>
132
  </figure>
133
 
134
+ For `Gaussian` sampling we started a new optimizer after 230 steps with 128 seq len, using a short warmup interval. Results are much better using this procedure. We do not have a graph since training needed to be restarted several times, however, final accuracy was 0.6873 compared to 0.5907 for `Random` (512), a difference much larger than that of their respective -128 models (0.6520 for `Random`, 0.6608 for `Gaussian`).
135
 
136
  ## Results
137
 
153
  | XNLI | Accuracy | 0.8016 | WiP | 0.8130 | 0.7876 | WiP |
154
 
155
 
156
+ <caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta, seq len 128).</caption>
157
+ </figure>
158
+
159
+ All of our models attained good accuracy values, in the range of 0.65, as can be seen in Table 2:
160
+
161
+ <figure>
162
+
163
+ | Model || Accuracy |
164
+ |----------------------------------------------------|----------|
165
+ | flax-community/bertin-roberta-large-spanish | 0.6537 |
166
+ | bertin-project/bertin-roberta-base-spanish | 0.6547 |
167
+ | bertin-project/bertin-base-random | 0.6520 |
168
+ | bertin-project/bertin-base-stepwise | 0.6487 |
169
+ | bertin-project/bertin-base-gaussian | 0.6608 |
170
+ | bertin-project/bertin-base-random-exp-512seqlen | 0.5907 |
171
+ | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.6873** |
172
+
173
+
174
+ <caption>Table 2. Accuracy for the different language models.</caption>
175
  </figure>
176
 
177
  We are currently in the process of applying our language models to downstream tasks.
198
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.9662** | **0.9714** |
199
 
200
 
201
+ <caption>Table 3. Results for POS.</caption>
202
  </figure>
203
 
204
 
221
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.8764** | **0.9819** |
222
 
223
 
224
+ <caption>Table 4. Results for NER.</caption>
225
  </figure>
226
 
227
 
244
  | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.875** |
245
 
246
 
247
+ <caption>Table 5. Results for PAWS-X.</caption>
248
  </figure>
249
 
250
  **CNLI**
266
  | bertin-project/bertin-base-gaussian-exp-512seqlen | 0.7878 |
267
 
268
 
269
+ <caption>Table 6. Results for CNLI.</caption>
270
  </figure>
271
 
272
  # Conclusions