nicholasKluge
/

TeenyTinyLlama-160m

@@ -168,14 +168,14 @@ for i, completion in enumerate(completions):
 | Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
 |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
 | [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m)     | 33.01   | 29.40                                   | 33.00                                         | 28.55                                    | 41.10                                          |
-| [Bloom-560m](https://huggingface.co/bigscience/bloom-560m)*                         | 32.13   | 24.74                                   | 37.15                                         | 24.22                                    | 42.44                                          |
 | [Xglm-564M](https://huggingface.co/facebook/xglm-564M)                              | 31.97   | 25.56                                   | 34.64*                                        | 25.18*                                   | 42.53                                          |
 | [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m)     | 31.16   | 26.15                                   | 29.29                                         | 28.11                                    | 41.12                                          |
-| [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)*               | 31.16   | 24.06                                   | 31.39                                         | 24.86                                    | 44.34                                          |
-| [OPT-125m](https://huggingface.co/facebook/opt-125m)*                               | 30.80   | 22.87                                   | 31.47                                         | 26.02                                    | 42.87                                          |
-| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48                                   | 29.62                                         | 27.36                                    | 41.44                                          |
-| [Gpt2-small](https://huggingface.co/gpt2)*                                          | 29.97   | 21.48                                   | 31.60                                         | 25.79                                    | 40.65                                          |
-| [Multilingual GPT](https://huggingface.co/ai-forever/mGPT)*                         | 28.73   | 23.81                                   | 26.37                                         | 25.17                                    | 39.62                                          |
 - Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were extracted from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

 | Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
 |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
 | [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m)     | 33.01   | 29.40                                   | 33.00                                         | 28.55                                    | 41.10                                          |
+| [Bloom-560m](https://huggingface.co/bigscience/bloom-560m)                          | 32.13   | 24.74*                                  | 37.15*                                        | 24.22*                                   | 42.44*                                         |
 | [Xglm-564M](https://huggingface.co/facebook/xglm-564M)                              | 31.97   | 25.56                                   | 34.64*                                        | 25.18*                                   | 42.53                                          |
 | [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m)     | 31.16   | 26.15                                   | 29.29                                         | 28.11                                    | 41.12                                          |
+| [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)                | 31.16   | 24.06*                                  | 31.39*                                        | 24.86*                                   | 44.34*                                         |
+| [OPT-125m](https://huggingface.co/facebook/opt-125m)                                | 30.80   | 22.87                                   | 31.47                                         | 26.02                                    | 42.87                                          |
+| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48*                                  | 29.62*                                        | 27.36*                                   | 41.44*                                         |
+| [Gpt2-small](https://huggingface.co/gpt2)                                           | 29.97   | 21.48*                                  | 31.60*                                        | 25.79*                                   | 40.65*                                         |
+| [Multilingual GPT](https://huggingface.co/ai-forever/mGPT)                          | 29.45   | 24.79                                   | 26.37*                                        | 25.17*                                   | 41.50                                          |
 - Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were extracted from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).