Jinchen commited on
Commit
a77da42
1 Parent(s): 710a54e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -31,6 +31,35 @@ More information needed
31
 
32
  ## Training procedure
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ### Training hyperparameters
35
 
36
  The following hyperparameters were used during training:
 
31
 
32
  ## Training procedure
33
 
34
+ Trained on 16 Graphcore Mk2 IPUs using [optimum-graphcore](https://github.com/huggingface/optimum-graphcore).
35
+
36
+ Command line:
37
+
38
+ ```
39
+ python examples/language-modeling/run_clm.py \
40
+ --model_name_or_path gpt2-medium \
41
+ --ipu_config_name Graphcore/gpt2-medium-ipu \
42
+ --dataset_name wikitext \
43
+ --dataset_config_name wikitext-103-raw-v1 \
44
+ --do_train \
45
+ --do_eval \
46
+ --num_train_epochs 10 \
47
+ --dataloader_num_workers 64 \
48
+ --per_device_train_batch_size 1 \
49
+ --per_device_eval_batch_size 1 \
50
+ --gradient_accumulation_steps 256 \
51
+ --output_dir /tmp/clm_output_medium \
52
+ --logging_steps 5 \
53
+ --learning_rate 1e-5 \
54
+ --lr_scheduler_type linear \
55
+ --loss_scaling 16384 \
56
+ --weight_decay 0.01 \
57
+ --warmup_ratio 0.1 \
58
+ --ipu_config_overrides="embedding_serialization_factor=5,inference_device_iterations=9,replication_factor=2,inference_replication_factor=2,ipus_per_replica=8,layers_per_ipu=[0 3 3 3 3 4 4 4],matmul_proportion=0.25" \
59
+ --dataloader_drop_last \
60
+ --pod_type pod16
61
+ ```
62
+
63
  ### Training hyperparameters
64
 
65
  The following hyperparameters were used during training: