Transformers
PyTorch
Graphcore
bert
Generated from Trainer
Inference Endpoints
jimypbr commited on
Commit
f09ee5e
1 Parent(s): 4d6353c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -5
README.md CHANGED
@@ -13,6 +13,9 @@ model-index:
13
 
14
  This model is a pre-trained BERT-Base trained in two phases on the [Graphcore/wikipedia-bert-128](https://huggingface.co/datasets/Graphcore/wikipedia-bert-128) and [Graphcore/wikipedia-bert-512](https://huggingface.co/datasets/Graphcore/wikipedia-bert-512) datasets.
15
 
 
 
 
16
  ## Model description
17
 
18
  Pre-trained BERT Base model trained on Wikipedia data.
@@ -28,7 +31,9 @@ Trained on wikipedia datasets:
28
  ## Training procedure
29
 
30
  Trained MLM and NSP pre-training scheme from [Large Batch Optimization for Deep Learning: Training BERT in 76 minutes](https://arxiv.org/abs/1904.00962).
31
- Trained on 16 Graphcore Mk2 IPUs using [`optimum-graphcore`](https://github.com/huggingface/optimum-graphcore)
 
 
32
 
33
  Command lines:
34
 
@@ -37,11 +42,11 @@ Phase 1:
37
  python examples/language-modeling/run_pretraining.py \
38
  --config_name bert-base-uncased \
39
  --tokenizer_name bert-base-uncased \
 
 
40
  --do_train \
41
  --logging_steps 5 \
42
  --max_seq_length 128 \
43
- --ipu_config_name Graphcore/bert-base-ipu \
44
- --dataset_name Graphcore/wikipedia-bert-128 \
45
  --max_steps 10500 \
46
  --is_already_preprocessed \
47
  --dataloader_num_workers 64 \
@@ -66,12 +71,12 @@ Phase 2:
66
  python examples/language-modeling/run_pretraining.py \
67
  --config_name bert-base-uncased \
68
  --tokenizer_name bert-base-uncased \
 
 
69
  --model_name_or_path ./output-pretrain-bert-base-phase1 \
70
  --do_train \
71
  --logging_steps 5 \
72
  --max_seq_length 512 \
73
- --ipu_config_name Graphcore/bert-base-ipu \
74
- --dataset_name Graphcore/wikipedia-bert-512 \
75
  --max_steps 2038 \
76
  --is_already_preprocessed \
77
  --dataloader_num_workers 128 \
@@ -129,3 +134,34 @@ The following hyperparameters were used during phase 2 training:
129
  - Pytorch 1.10.0+cpu
130
  - Datasets 1.18.3.dev0
131
  - Tokenizers 0.10.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  This model is a pre-trained BERT-Base trained in two phases on the [Graphcore/wikipedia-bert-128](https://huggingface.co/datasets/Graphcore/wikipedia-bert-128) and [Graphcore/wikipedia-bert-512](https://huggingface.co/datasets/Graphcore/wikipedia-bert-512) datasets.
15
 
16
+ It was trained on a Graphcore IPU-POD16 using [`optimum-graphcore`](https://github.com/huggingface/optimum-graphcore).
17
+ Graphcore and Hugging Face are working together to make training of Transformer models on IPUs fast and easy. Learn more about how to take advantage of the power of Graphcore IPUs to train Transformers models at [hf.co/hardware/graphcore](https://huggingface.co/hardware/graphcore).
18
+
19
  ## Model description
20
 
21
  Pre-trained BERT Base model trained on Wikipedia data.
 
31
  ## Training procedure
32
 
33
  Trained MLM and NSP pre-training scheme from [Large Batch Optimization for Deep Learning: Training BERT in 76 minutes](https://arxiv.org/abs/1904.00962).
34
+ Trained on a Graphcore IPU-POD16 using [`optimum-graphcore`](https://github.com/huggingface/optimum-graphcore).
35
+
36
+ It was trained with the IPUConfig [Graphcore/bert-base-ipu](https://huggingface.co/Graphcore/bert-base-ipu/).
37
 
38
  Command lines:
39
 
 
42
  python examples/language-modeling/run_pretraining.py \
43
  --config_name bert-base-uncased \
44
  --tokenizer_name bert-base-uncased \
45
+ --ipu_config_name Graphcore/bert-base-ipu \
46
+ --dataset_name Graphcore/wikipedia-bert-128 \
47
  --do_train \
48
  --logging_steps 5 \
49
  --max_seq_length 128 \
 
 
50
  --max_steps 10500 \
51
  --is_already_preprocessed \
52
  --dataloader_num_workers 64 \
 
71
  python examples/language-modeling/run_pretraining.py \
72
  --config_name bert-base-uncased \
73
  --tokenizer_name bert-base-uncased \
74
+ --ipu_config_name Graphcore/bert-base-ipu \
75
+ --dataset_name Graphcore/wikipedia-bert-512 \
76
  --model_name_or_path ./output-pretrain-bert-base-phase1 \
77
  --do_train \
78
  --logging_steps 5 \
79
  --max_seq_length 512 \
 
 
80
  --max_steps 2038 \
81
  --is_already_preprocessed \
82
  --dataloader_num_workers 128 \
 
134
  - Pytorch 1.10.0+cpu
135
  - Datasets 1.18.3.dev0
136
  - Tokenizers 0.10.3
137
+
138
+ ## Fine-tuning with these weights
139
+
140
+ These weights can be used in either `transformers` or [`optimum-graphcore`](https://github.com/huggingface/optimum-graphcore).
141
+
142
+ For example, to fine-tune the GLUE task SST2 with `optimum-graphcore` you can do:
143
+
144
+ ```
145
+ export TOKENIZERS_PARALLELISM=true
146
+ python examples/text-classification/run_glue.py \
147
+ --model_name_or_path bert-base-uncased \
148
+ --ipu_config_name Graphcore/bert-base-ipu \
149
+ --task_name sst2 \
150
+ --do_train \
151
+ --do_eval \
152
+ --max_seq_length 128 \
153
+ --per_device_train_batch_size 1 \
154
+ --per_device_eval_batch_size 4 \
155
+ --gradient_accumulation_steps 32 \
156
+ --pod_type pod4 \
157
+ --learning_rate 2e-5 \
158
+ --lr_scheduler_type linear \
159
+ --warmup_ratio 0.25 \
160
+ --num_train_epochs 3 \
161
+ --seed 1984 \
162
+ --save_steps -1 \
163
+ --dataloader_num_workers 64 \
164
+ --dataloader_drop_last \
165
+ --overwrite_output_dir \
166
+ --output_dir /tmp/sst2
167
+ ```