Text Generation
Transformers
PyTorch
English
gpt2
causal-lm
text-generation-inference
rskuzma commited on
Commit
c647fc8
1 Parent(s): 21695fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -125,7 +125,7 @@ Model Params | Sequence Length | Batch Size | Number of Steps | Tokens | Tokens
125
 
126
  ## Evaluations
127
 
128
- We evaluate our models on the PILE validation set comprising 380M tokens. We also evaluate the public checkpoints of Pythia Eleuther (2022), OPT Zhang et al. (2022), GPT-NeoX 20B Black et al. (2022), and GPT-J 6B Wang & Komatsuzaki (2021). We trained models from smallest to largest and fit a power law as we went along. The power law was helpful for extrapolating the validation loss of the next largest model we trained and provided confidence about whether the training run was going well.
129
 
130
  #### 0-shot Evaluation
131
  | Model | Params | Training FLOPs | PILE test xent | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA | Downstream Average |
 
125
 
126
  ## Evaluations
127
 
128
+ We evaluate our models on the PILE validation set comprising 380M tokens. In our paper we also evaluate the public checkpoints of Pythia, Eleuther (2022); OPT, Zhang et al. (2022); GPT-NeoX 20B, Black et al. (2022); and GPT-J 6B, Wang & Komatsuzaki (2021). We trained models from smallest to largest and fit a power law as we went along. The power law was helpful for extrapolating the validation loss of the next largest model we trained and provided confidence about whether the training run was going well.
129
 
130
  #### 0-shot Evaluation
131
  | Model | Params | Training FLOPs | PILE test xent | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA | Downstream Average |