Jinchen commited on
Commit
42977d5
1 Parent(s): 1f3f3c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -32,6 +32,35 @@ More information needed
32
 
33
  ## Training procedure
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
 
32
 
33
  ## Training procedure
34
 
35
+ Trained on 16 Graphcore Mk2 IPUs using [optimum-graphcore](https://github.com/huggingface/optimum-graphcore).
36
+
37
+ Command line:
38
+
39
+ ```
40
+ python examples/language-modeling/run_clm.py \
41
+ --model_name_or_path gpt2 \
42
+ --ipu_config_name Graphcore/gpt2-small-ipu \
43
+ --dataset_name wikitext \
44
+ --dataset_config_name wikitext-103-raw-v1 \
45
+ --do_train \
46
+ --do_eval \
47
+ --num_train_epochs 10 \
48
+ --dataloader_num_workers 64 \
49
+ --per_device_train_batch_size 1 \
50
+ --per_device_eval_batch_size 1 \
51
+ --gradient_accumulation_steps 128 \
52
+ --output_dir /tmp/clm_output \
53
+ --logging_steps 5 \
54
+ --learning_rate 1e-5 \
55
+ --lr_scheduler_type linear \
56
+ --loss_scaling 16384 \
57
+ --weight_decay 0.01 \
58
+ --warmup_ratio 0.1 \
59
+ --ipu_config_overrides="embedding_serialization_factor=4,optimizer_state_offchip=true,inference_device_iterations=5" \
60
+ --dataloader_drop_last \
61
+ --pod_type pod16
62
+ ```
63
+
64
  ### Training hyperparameters
65
 
66
  The following hyperparameters were used during training: