Update README.md
Browse files
README.md
CHANGED
@@ -31,6 +31,35 @@ More information needed
|
|
31 |
|
32 |
## Training procedure
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
### Training hyperparameters
|
35 |
|
36 |
The following hyperparameters were used during training:
|
|
|
31 |
|
32 |
## Training procedure
|
33 |
|
34 |
+
Trained on 16 Graphcore Mk2 IPUs using [optimum-graphcore](https://github.com/huggingface/optimum-graphcore).
|
35 |
+
|
36 |
+
Command line:
|
37 |
+
|
38 |
+
```
|
39 |
+
python examples/language-modeling/run_clm.py \
|
40 |
+
--model_name_or_path gpt2-medium \
|
41 |
+
--ipu_config_name Graphcore/gpt2-medium-ipu \
|
42 |
+
--dataset_name wikitext \
|
43 |
+
--dataset_config_name wikitext-103-raw-v1 \
|
44 |
+
--do_train \
|
45 |
+
--do_eval \
|
46 |
+
--num_train_epochs 10 \
|
47 |
+
--dataloader_num_workers 64 \
|
48 |
+
--per_device_train_batch_size 1 \
|
49 |
+
--per_device_eval_batch_size 1 \
|
50 |
+
--gradient_accumulation_steps 256 \
|
51 |
+
--output_dir /tmp/clm_output_medium \
|
52 |
+
--logging_steps 5 \
|
53 |
+
--learning_rate 1e-5 \
|
54 |
+
--lr_scheduler_type linear \
|
55 |
+
--loss_scaling 16384 \
|
56 |
+
--weight_decay 0.01 \
|
57 |
+
--warmup_ratio 0.1 \
|
58 |
+
--ipu_config_overrides="embedding_serialization_factor=5,inference_device_iterations=9,replication_factor=2,inference_replication_factor=2,ipus_per_replica=8,layers_per_ipu=[0 3 3 3 3 4 4 4],matmul_proportion=0.25" \
|
59 |
+
--dataloader_drop_last \
|
60 |
+
--pod_type pod16
|
61 |
+
```
|
62 |
+
|
63 |
### Training hyperparameters
|
64 |
|
65 |
The following hyperparameters were used during training:
|