nicholasKluge
commited on
Commit
•
36f7a59
1
Parent(s):
374d391
Update README.md
Browse files
README.md
CHANGED
@@ -163,7 +163,11 @@ for i, completion in enumerate(completions):
|
|
163 |
|
164 |
| Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
|
165 |
|-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
|
|
|
|
|
|
|
166 |
| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
|
|
|
167 |
|
168 |
* Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
|
169 |
|
|
|
163 |
|
164 |
| Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
|
165 |
|-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
|
166 |
+
| [Teeny Tiny Llama 162m](https://huggingface.co/nicholasKluge/Teeny-tiny-llama-162m) | 31.16 | 26.15 | 29.29 | 28.11 | 41.12 |
|
167 |
+
| [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped) | 31.16 | 24.06 | 31.39 | 24.86 | 44.34 |
|
168 |
+
| [OPT-125m](https://huggingface.co/facebook/opt-125m) | 30.80 | 22.87 | 31.47 | 26.02 | 42.87 |
|
169 |
| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
|
170 |
+
| [Gpt2-small](https://huggingface.co/gpt2) | 29.97 | 21.48 | 31.60 | 25.79 | 40.65 |
|
171 |
|
172 |
* Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
|
173 |
|