Update README.md
Browse files
README.md
CHANGED
@@ -27,6 +27,36 @@ Trained on wikipedia datasets:
|
|
27 |
- [Graphcore/wikipedia-bert-128](https://huggingface.co/datasets/Graphcore/wikipedia-bert-128)
|
28 |
- [Graphcore/wikipedia-bert-512](https://huggingface.co/datasets/Graphcore/wikipedia-bert-512)
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
## Training procedure
|
32 |
|
@@ -133,35 +163,4 @@ The following hyperparameters were used during phase 2 training:
|
|
133 |
- Transformers 4.17.0.dev0
|
134 |
- Pytorch 1.10.0+cpu
|
135 |
- Datasets 1.18.3.dev0
|
136 |
-
- Tokenizers 0.10.3
|
137 |
-
|
138 |
-
## Fine-tuning with these weights
|
139 |
-
|
140 |
-
These weights can be used in either `transformers` or [`optimum-graphcore`](https://github.com/huggingface/optimum-graphcore).
|
141 |
-
|
142 |
-
For example, to fine-tune the GLUE task SST2 with `optimum-graphcore` you can do:
|
143 |
-
|
144 |
-
```
|
145 |
-
export TOKENIZERS_PARALLELISM=true
|
146 |
-
python examples/text-classification/run_glue.py \
|
147 |
-
--model_name_or_path bert-base-uncased \
|
148 |
-
--ipu_config_name Graphcore/bert-base-ipu \
|
149 |
-
--task_name sst2 \
|
150 |
-
--do_train \
|
151 |
-
--do_eval \
|
152 |
-
--max_seq_length 128 \
|
153 |
-
--per_device_train_batch_size 1 \
|
154 |
-
--per_device_eval_batch_size 4 \
|
155 |
-
--gradient_accumulation_steps 32 \
|
156 |
-
--pod_type pod4 \
|
157 |
-
--learning_rate 2e-5 \
|
158 |
-
--lr_scheduler_type linear \
|
159 |
-
--warmup_ratio 0.25 \
|
160 |
-
--num_train_epochs 3 \
|
161 |
-
--seed 1984 \
|
162 |
-
--save_steps -1 \
|
163 |
-
--dataloader_num_workers 64 \
|
164 |
-
--dataloader_drop_last \
|
165 |
-
--overwrite_output_dir \
|
166 |
-
--output_dir /tmp/sst2
|
167 |
-
```
|
|
|
27 |
- [Graphcore/wikipedia-bert-128](https://huggingface.co/datasets/Graphcore/wikipedia-bert-128)
|
28 |
- [Graphcore/wikipedia-bert-512](https://huggingface.co/datasets/Graphcore/wikipedia-bert-512)
|
29 |
|
30 |
+
## Fine-tuning with these weights
|
31 |
+
|
32 |
+
These weights can be used in either `transformers` or [`optimum-graphcore`](https://github.com/huggingface/optimum-graphcore).
|
33 |
+
|
34 |
+
For example, to fine-tune the GLUE task SST2 with `optimum-graphcore` you can do:
|
35 |
+
|
36 |
+
```
|
37 |
+
export TOKENIZERS_PARALLELISM=true
|
38 |
+
python examples/text-classification/run_glue.py \
|
39 |
+
--model_name_or_path bert-base-uncased \
|
40 |
+
--ipu_config_name Graphcore/bert-base-ipu \
|
41 |
+
--task_name sst2 \
|
42 |
+
--do_train \
|
43 |
+
--do_eval \
|
44 |
+
--max_seq_length 128 \
|
45 |
+
--per_device_train_batch_size 1 \
|
46 |
+
--per_device_eval_batch_size 4 \
|
47 |
+
--gradient_accumulation_steps 32 \
|
48 |
+
--pod_type pod4 \
|
49 |
+
--learning_rate 2e-5 \
|
50 |
+
--lr_scheduler_type linear \
|
51 |
+
--warmup_ratio 0.25 \
|
52 |
+
--num_train_epochs 3 \
|
53 |
+
--seed 1984 \
|
54 |
+
--save_steps -1 \
|
55 |
+
--dataloader_num_workers 64 \
|
56 |
+
--dataloader_drop_last \
|
57 |
+
--overwrite_output_dir \
|
58 |
+
--output_dir /tmp/sst2
|
59 |
+
```
|
60 |
|
61 |
## Training procedure
|
62 |
|
|
|
163 |
- Transformers 4.17.0.dev0
|
164 |
- Pytorch 1.10.0+cpu
|
165 |
- Datasets 1.18.3.dev0
|
166 |
+
- Tokenizers 0.10.3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|