Graphcore
/

gptj-mnli

Text Generation

text-classification

Inference Endpoints

Model card Files Files and versions Community

sofial commited on Aug 24, 2022

Commit

8c6f40a

•

1 Parent(s): 02e5a08

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -45,6 +45,15 @@ Prompt sentences are tokenized and packed together to form 1024 token sequences,
 Since the model is trained to predict the next token, labels are simply the input sequence shifted by one token.
 Given the training format, no extra care is needed to account for different sequences: the model does not need to know which sentence a token belongs to.
 ## How to use
 The model can be easily loaded using AutoModelForCausalLM.
 You can use the pipeline API for text generation.

 Since the model is trained to predict the next token, labels are simply the input sequence shifted by one token.
 Given the training format, no extra care is needed to account for different sequences: the model does not need to know which sentence a token belongs to.
+### Hyperparameters:
+- epochs:
+- optimiser: AdamW (beta1: 0.9, beta2: 0.999, eps: 1e-6, weight decay: 0.0, learning rate: 5e-6)
+- learning rate schedule: warmup schedule (min: 1e-7, max: 5e-6, warmup proportion: 0.005995)
+- batch size: 128
+## Performance
+The resulting model matches SOTA performance with 82.5% accuracy.
 ## How to use
 The model can be easily loaded using AutoModelForCausalLM.
 You can use the pipeline API for text generation.