Update README.md
Browse files
README.md
CHANGED
@@ -45,6 +45,15 @@ Prompt sentences are tokenized and packed together to form 1024 token sequences,
|
|
45 |
Since the model is trained to predict the next token, labels are simply the input sequence shifted by one token.
|
46 |
Given the training format, no extra care is needed to account for different sequences: the model does not need to know which sentence a token belongs to.
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
## How to use
|
49 |
The model can be easily loaded using AutoModelForCausalLM.
|
50 |
You can use the pipeline API for text generation.
|
|
|
45 |
Since the model is trained to predict the next token, labels are simply the input sequence shifted by one token.
|
46 |
Given the training format, no extra care is needed to account for different sequences: the model does not need to know which sentence a token belongs to.
|
47 |
|
48 |
+
### Hyperparameters:
|
49 |
+
- epochs:
|
50 |
+
- optimiser: AdamW (beta1: 0.9, beta2: 0.999, eps: 1e-6, weight decay: 0.0, learning rate: 5e-6)
|
51 |
+
- learning rate schedule: warmup schedule (min: 1e-7, max: 5e-6, warmup proportion: 0.005995)
|
52 |
+
- batch size: 128
|
53 |
+
|
54 |
+
## Performance
|
55 |
+
The resulting model matches SOTA performance with 82.5% accuracy.
|
56 |
+
|
57 |
## How to use
|
58 |
The model can be easily loaded using AutoModelForCausalLM.
|
59 |
You can use the pipeline API for text generation.
|