Transformers
PyTorch
Graphcore
bert
Generated from Trainer
Inference Endpoints
jimypbr commited on
Commit
923357d
1 Parent(s): f09ee5e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -32
README.md CHANGED
@@ -27,6 +27,36 @@ Trained on wikipedia datasets:
27
  - [Graphcore/wikipedia-bert-128](https://huggingface.co/datasets/Graphcore/wikipedia-bert-128)
28
  - [Graphcore/wikipedia-bert-512](https://huggingface.co/datasets/Graphcore/wikipedia-bert-512)
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Training procedure
32
 
@@ -133,35 +163,4 @@ The following hyperparameters were used during phase 2 training:
133
  - Transformers 4.17.0.dev0
134
  - Pytorch 1.10.0+cpu
135
  - Datasets 1.18.3.dev0
136
- - Tokenizers 0.10.3
137
-
138
- ## Fine-tuning with these weights
139
-
140
- These weights can be used in either `transformers` or [`optimum-graphcore`](https://github.com/huggingface/optimum-graphcore).
141
-
142
- For example, to fine-tune the GLUE task SST2 with `optimum-graphcore` you can do:
143
-
144
- ```
145
- export TOKENIZERS_PARALLELISM=true
146
- python examples/text-classification/run_glue.py \
147
- --model_name_or_path bert-base-uncased \
148
- --ipu_config_name Graphcore/bert-base-ipu \
149
- --task_name sst2 \
150
- --do_train \
151
- --do_eval \
152
- --max_seq_length 128 \
153
- --per_device_train_batch_size 1 \
154
- --per_device_eval_batch_size 4 \
155
- --gradient_accumulation_steps 32 \
156
- --pod_type pod4 \
157
- --learning_rate 2e-5 \
158
- --lr_scheduler_type linear \
159
- --warmup_ratio 0.25 \
160
- --num_train_epochs 3 \
161
- --seed 1984 \
162
- --save_steps -1 \
163
- --dataloader_num_workers 64 \
164
- --dataloader_drop_last \
165
- --overwrite_output_dir \
166
- --output_dir /tmp/sst2
167
- ```
 
27
  - [Graphcore/wikipedia-bert-128](https://huggingface.co/datasets/Graphcore/wikipedia-bert-128)
28
  - [Graphcore/wikipedia-bert-512](https://huggingface.co/datasets/Graphcore/wikipedia-bert-512)
29
 
30
+ ## Fine-tuning with these weights
31
+
32
+ These weights can be used in either `transformers` or [`optimum-graphcore`](https://github.com/huggingface/optimum-graphcore).
33
+
34
+ For example, to fine-tune the GLUE task SST2 with `optimum-graphcore` you can do:
35
+
36
+ ```
37
+ export TOKENIZERS_PARALLELISM=true
38
+ python examples/text-classification/run_glue.py \
39
+ --model_name_or_path bert-base-uncased \
40
+ --ipu_config_name Graphcore/bert-base-ipu \
41
+ --task_name sst2 \
42
+ --do_train \
43
+ --do_eval \
44
+ --max_seq_length 128 \
45
+ --per_device_train_batch_size 1 \
46
+ --per_device_eval_batch_size 4 \
47
+ --gradient_accumulation_steps 32 \
48
+ --pod_type pod4 \
49
+ --learning_rate 2e-5 \
50
+ --lr_scheduler_type linear \
51
+ --warmup_ratio 0.25 \
52
+ --num_train_epochs 3 \
53
+ --seed 1984 \
54
+ --save_steps -1 \
55
+ --dataloader_num_workers 64 \
56
+ --dataloader_drop_last \
57
+ --overwrite_output_dir \
58
+ --output_dir /tmp/sst2
59
+ ```
60
 
61
  ## Training procedure
62
 
 
163
  - Transformers 4.17.0.dev0
164
  - Pytorch 1.10.0+cpu
165
  - Datasets 1.18.3.dev0
166
+ - Tokenizers 0.10.3