sofial commited on
Commit
acd926a
1 Parent(s): bbc6d74

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -56,18 +56,25 @@ Fine tuning is done using the `train` split of the GLUE MNLI dataset and the per
56
 
57
  `validation_mismatched` means validation examples are not derived from the same sources as those in the training set and therefore not closely resembling any of the examples seen at training time.
58
 
 
 
 
 
59
  ## Fine-tuning procedure
60
  Fine tuned on a Graphcore IPU-POD64 using `popxl`.
61
 
62
  Prompt sentences are tokenized and packed together to form 1024 token sequences, following [HF packing algorithm](https://github.com/huggingface/transformers/blob/v4.20.1/examples/pytorch/language-modeling/run_clm.py). No padding is used.
 
 
 
63
  Since the model is trained to predict the next token, labels are simply the input sequence shifted by one token.
64
  Given the training format, no extra care is needed to account for different sequences: the model does not need to know which sentence a token belongs to.
65
 
66
  ### Hyperparameters:
67
- - epochs:
68
  - optimiser: AdamW (beta1: 0.9, beta2: 0.999, eps: 1e-6, weight decay: 0.0, learning rate: 5e-6)
69
  - learning rate schedule: warmup schedule (min: 1e-7, max: 5e-6, warmup proportion: 0.005995)
70
  - batch size: 128
 
71
 
72
  ## Performance
73
  The resulting model matches SOTA performance with 82.5% accuracy.
 
56
 
57
  `validation_mismatched` means validation examples are not derived from the same sources as those in the training set and therefore not closely resembling any of the examples seen at training time.
58
 
59
+ Data splits for the mnli dataset are the following
60
+ |train |validation_matched|validation_mismatched|
61
+ |-----:|-----------------:|--------------------:|
62
+ |392702| 9815| 9832|
63
  ## Fine-tuning procedure
64
  Fine tuned on a Graphcore IPU-POD64 using `popxl`.
65
 
66
  Prompt sentences are tokenized and packed together to form 1024 token sequences, following [HF packing algorithm](https://github.com/huggingface/transformers/blob/v4.20.1/examples/pytorch/language-modeling/run_clm.py). No padding is used.
67
+ The packing process works in groups of 1000 examples and discards any remainder from each group that isn't a whole sequence.
68
+ For the 392,702 training examples this gives a total of 17,762 sequences per epoch.
69
+
70
  Since the model is trained to predict the next token, labels are simply the input sequence shifted by one token.
71
  Given the training format, no extra care is needed to account for different sequences: the model does not need to know which sentence a token belongs to.
72
 
73
  ### Hyperparameters:
 
74
  - optimiser: AdamW (beta1: 0.9, beta2: 0.999, eps: 1e-6, weight decay: 0.0, learning rate: 5e-6)
75
  - learning rate schedule: warmup schedule (min: 1e-7, max: 5e-6, warmup proportion: 0.005995)
76
  - batch size: 128
77
+ - training steps: 300. Each epoch consists of ceil(17,762/128) steps, hence 300 steps are approximately 2 epochs.
78
 
79
  ## Performance
80
  The resulting model matches SOTA performance with 82.5% accuracy.