nicholasKluge
/

TeenyTinyLlama-160m

@@ -163,7 +163,11 @@ for i, completion in enumerate(completions):
 | Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
 |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
 | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48                                   | 29.62                                         | 27.36                                    | 41.44                                          |
 * Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.

 | Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
 |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
+| [Teeny Tiny Llama 162m](https://huggingface.co/nicholasKluge/Teeny-tiny-llama-162m) | 31.16   | 26.15                                   | 29.29                                         | 28.11                                    | 41.12                                          |
+| [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)                | 31.16   | 24.06                                   | 31.39                                         | 24.86                                    | 44.34                                          |
+| [OPT-125m](https://huggingface.co/facebook/opt-125m)                                | 30.80   | 22.87                                   | 31.47                                         | 26.02                                    | 42.87                                          |
 | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48                                   | 29.62                                         | 27.36                                    | 41.44                                          |
+| [Gpt2-small](https://huggingface.co/gpt2)                                           | 29.97   | 21.48                                   | 31.60                                         | 25.79                                    | 40.65                                          |
 * Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.