nicholasKluge commited on
Commit
ea1e122
1 Parent(s): 0644137

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -168,14 +168,14 @@ for i, completion in enumerate(completions):
168
  | Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
169
  |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
170
  | [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) | 33.01 | 29.40 | 33.00 | 28.55 | 41.10 |
171
- | [Bloom-560m](https://huggingface.co/bigscience/bloom-560m)* | 32.13 | 24.74 | 37.15 | 24.22 | 42.44 |
172
  | [Xglm-564M](https://huggingface.co/facebook/xglm-564M) | 31.97 | 25.56 | 34.64* | 25.18* | 42.53 |
173
  | [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) | 31.16 | 26.15 | 29.29 | 28.11 | 41.12 |
174
- | [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)* | 31.16 | 24.06 | 31.39 | 24.86 | 44.34 |
175
- | [OPT-125m](https://huggingface.co/facebook/opt-125m)* | 30.80 | 22.87 | 31.47 | 26.02 | 42.87 |
176
- | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
177
- | [Gpt2-small](https://huggingface.co/gpt2)* | 29.97 | 21.48 | 31.60 | 25.79 | 40.65 |
178
- | [Multilingual GPT](https://huggingface.co/ai-forever/mGPT)* | 28.73 | 23.81 | 26.37 | 25.17 | 39.62 |
179
 
180
  - Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were extracted from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
181
 
 
168
  | Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
169
  |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
170
  | [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) | 33.01 | 29.40 | 33.00 | 28.55 | 41.10 |
171
+ | [Bloom-560m](https://huggingface.co/bigscience/bloom-560m) | 32.13 | 24.74* | 37.15* | 24.22* | 42.44* |
172
  | [Xglm-564M](https://huggingface.co/facebook/xglm-564M) | 31.97 | 25.56 | 34.64* | 25.18* | 42.53 |
173
  | [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) | 31.16 | 26.15 | 29.29 | 28.11 | 41.12 |
174
+ | [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped) | 31.16 | 24.06* | 31.39* | 24.86* | 44.34* |
175
+ | [OPT-125m](https://huggingface.co/facebook/opt-125m) | 30.80 | 22.87 | 31.47 | 26.02 | 42.87 |
176
+ | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48* | 29.62* | 27.36* | 41.44* |
177
+ | [Gpt2-small](https://huggingface.co/gpt2) | 29.97 | 21.48* | 31.60* | 25.79* | 40.65* |
178
+ | [Multilingual GPT](https://huggingface.co/ai-forever/mGPT) | 29.45 | 24.79 | 26.37* | 25.17* | 41.50 |
179
 
180
  - Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were extracted from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
181