olm
/

olm-gpt2-oct-2022

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Tristan commited on Jan 15, 2023

Commit

e4f5e40

·

1 Parent(s): 5d5b4cd

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -94,5 +94,7 @@ The model achieves the following results without any fine-tuning (zero-shot):
 |arc_easy     |acc/acc_norm|0.4381/0.3948        |**0.4651**/**0.4247**     |**0.0082**/**0.0029**              |
 |arc_challenge|acc/acc_norm|0.1903/0.2270        |0.1997/0.2329             |0.4132/0.6256                      |
-To get these results, we used the Eleuther AI evaluation harness [here](https://github.com/EleutherAI/lm-evaluation-harness),
-which can produce results different than those reported in the GPT2 paper. The p-values come from the stderr from the evaluation harness, plus a normal distribution assumption.

 |arc_easy     |acc/acc_norm|0.4381/0.3948        |**0.4651**/**0.4247**     |**0.0082**/**0.0029**              |
 |arc_challenge|acc/acc_norm|0.1903/0.2270        |0.1997/0.2329             |0.4132/0.6256                      |
+To get these results, we used commit `4f0410a4be0049729078376ce36a42dc308b6e38` of the Eleuther AI evaluation harness [here](https://github.com/EleutherAI/lm-evaluation-harness),
+which can produce results different than those reported in the GPT2 paper.
+We added a change [here](https://github.com/EleutherAI/lm-evaluation-harness/compare/master...mathemakitten:lm-evaluation-harness:master) to enable evaluation of the OLM GPT2, which has a very slightly different vocab size.
+The p-values come from the stderr from the evaluation harness, plus a normal distribution assumption.