stefan-it commited on
Commit
c167da3
1 Parent(s): 8b4cdf3

readme: remove information about monolingual language models

Browse files
Files changed (1) hide show
  1. README.md +0 -84
README.md CHANGED
@@ -191,90 +191,6 @@ The following plot shows the pretraining loss curve:
191
 
192
  ![Training loss curve](stats/figures/pretraining_loss_historic-multilingual.png)
193
 
194
- ## English model
195
-
196
- The English BERT model - with texts from British Library corpus - was trained with the Hugging Face
197
- JAX/FLAX implementation for 10 epochs (approx. 1M steps) on a v3-8 TPU, using the following command:
198
-
199
- ```bash
200
- python3 run_mlm_flax.py --model_type bert \
201
- --config_name /mnt/datasets/bert-base-historic-english-cased/ \
202
- --tokenizer_name /mnt/datasets/bert-base-historic-english-cased/ \
203
- --train_file /mnt/datasets/bl-corpus/bl_1800-1900_extracted.txt \
204
- --validation_file /mnt/datasets/bl-corpus/english_validation.txt \
205
- --max_seq_length 512 \
206
- --per_device_train_batch_size 16 \
207
- --learning_rate 1e-4 \
208
- --num_train_epochs 10 \
209
- --preprocessing_num_workers 96 \
210
- --output_dir /mnt/datasets/bert-base-historic-english-cased-512-noadafactor-10e \
211
- --save_steps 2500 \
212
- --eval_steps 2500 \
213
- --warmup_steps 10000 \
214
- --line_by_line \
215
- --pad_to_max_length
216
- ```
217
-
218
- The following plot shows the pretraining loss curve:
219
-
220
- ![Training loss curve](stats/figures/pretraining_loss_historic_english.png)
221
-
222
- ## Finnish model
223
-
224
- The BERT model - with texts from Finnish part of Europeana - was trained with the Hugging Face
225
- JAX/FLAX implementation for 40 epochs (approx. 1M steps) on a v3-8 TPU, using the following command:
226
-
227
- ```bash
228
- python3 run_mlm_flax.py --model_type bert \
229
- --config_name /mnt/datasets/bert-base-finnish-europeana-cased/ \
230
- --tokenizer_name /mnt/datasets/bert-base-finnish-europeana-cased/ \
231
- --train_file /mnt/datasets/hlms/extracted_content_Finnish_0.6.txt \
232
- --validation_file /mnt/datasets/hlms/finnish_validation.txt \
233
- --max_seq_length 512 \
234
- --per_device_train_batch_size 16 \
235
- --learning_rate 1e-4 \
236
- --num_train_epochs 40 \
237
- --preprocessing_num_workers 96 \
238
- --output_dir /mnt/datasets/bert-base-finnish-europeana-cased-512-dupe1-noadafactor-40e \
239
- --save_steps 2500 \
240
- --eval_steps 2500 \
241
- --warmup_steps 10000 \
242
- --line_by_line \
243
- --pad_to_max_length
244
- ```
245
-
246
- The following plot shows the pretraining loss curve:
247
-
248
- ![Training loss curve](stats/figures/pretraining_loss_finnish_europeana.png)
249
-
250
- ## Swedish model
251
-
252
- The BERT model - with texts from Swedish part of Europeana - was trained with the Hugging Face
253
- JAX/FLAX implementation for 40 epochs (approx. 660K steps) on a v3-8 TPU, using the following command:
254
-
255
- ```bash
256
- python3 run_mlm_flax.py --model_type bert \
257
- --config_name /mnt/datasets/bert-base-swedish-europeana-cased/ \
258
- --tokenizer_name /mnt/datasets/bert-base-swedish-europeana-cased/ \
259
- --train_file /mnt/datasets/hlms/extracted_content_Swedish_0.6.txt \
260
- --validation_file /mnt/datasets/hlms/swedish_validation.txt \
261
- --max_seq_length 512 \
262
- --per_device_train_batch_size 16 \
263
- --learning_rate 1e-4 \
264
- --num_train_epochs 40 \
265
- --preprocessing_num_workers 96 \
266
- --output_dir /mnt/datasets/bert-base-swedish-europeana-cased-512-dupe1-noadafactor-40e \
267
- --save_steps 2500 \
268
- --eval_steps 2500 \
269
- --warmup_steps 10000 \
270
- --line_by_line \
271
- --pad_to_max_length
272
- ```
273
-
274
- The following plot shows the pretraining loss curve:
275
-
276
- ![Training loss curve](stats/figures/pretraining_loss_swedish_europeana.png)
277
-
278
  # Acknowledgments
279
 
280
  Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC) program, previously known as
 
191
 
192
  ![Training loss curve](stats/figures/pretraining_loss_historic-multilingual.png)
193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
  # Acknowledgments
195
 
196
  Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC) program, previously known as