cerebras
/

Cerebras-GPT-6.7B

Text Generation

text-generation-inference

Model card Files Files and versions Community

rskuzma commited on Mar 23, 2023

Commit

c647fc8

•

1 Parent(s): 21695fd

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -125,7 +125,7 @@ Model Params | Sequence Length | Batch Size | Number of Steps | Tokens | Tokens
 ## Evaluations
-We evaluate our models on the PILE validation set comprising 380M tokens. We also evaluate the public checkpoints of Pythia Eleuther (2022), OPT Zhang et al. (2022), GPT-NeoX 20B Black et al. (2022), and GPT-J 6B Wang & Komatsuzaki (2021). We trained models from smallest to largest and fit a power law as we went along. The power law was helpful for extrapolating the validation loss of the next largest model we trained and provided confidence about whether the training run was going well.
 #### 0-shot Evaluation
 | Model   | Params | Training FLOPs | PILE test xent | Hella-Swag | PIQA  | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA | Downstream Average |

 ## Evaluations
+We evaluate our models on the PILE validation set comprising 380M tokens. In our paper we also evaluate the public checkpoints of Pythia, Eleuther (2022); OPT, Zhang et al. (2022); GPT-NeoX 20B, Black et al. (2022); and GPT-J 6B, Wang & Komatsuzaki (2021). We trained models from smallest to largest and fit a power law as we went along. The power law was helpful for extrapolating the validation loss of the next largest model we trained and provided confidence about whether the training run was going well.
 #### 0-shot Evaluation
 | Model   | Params | Training FLOPs | PILE test xent | Hella-Swag | PIQA  | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA | Downstream Average |