nicholasKluge commited on
Commit
075aa74
1 Parent(s): d0f5027

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -170,16 +170,16 @@ for i, completion in enumerate(completions):
170
  | Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
171
  |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
172
  | [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) | 33.01 | 29.40 | 33.00 | 28.55 | 41.10 |
 
 
173
  | [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) | 31.16 | 26.15 | 29.29 | 28.11 | 41.12 |
174
  | [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)* | 31.16 | 24.06 | 31.39 | 24.86 | 44.34 |
175
  | [OPT-125m](https://huggingface.co/facebook/opt-125m)* | 30.80 | 22.87 | 31.47 | 26.02 | 42.87 |
176
  | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
177
  | [Gpt2-small](https://huggingface.co/gpt2)* | 29.97 | 21.48 | 31.60 | 25.79 | 40.65 |
178
- | [Xglm-564M](https://huggingface.co/facebook/xglm-564M)* | 31.20 | 24.57 | 34.64 | 25.18 | 40.43 |
179
- | [Bloom-560m](https://huggingface.co/bigscience/bloom-560m)* | 32.13 | 24.74 | 37.15 | 24.22 | 42.44 |
180
  | [Multilingual GPT](https://huggingface.co/ai-forever/mGPT)* | 28.73 | 23.81 | 26.37 | 25.17 | 39.62 |
181
 
182
- - Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were retirved from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
183
 
184
  ## Fine-Tuning Comparisons
185
 
 
170
  | Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
171
  |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
172
  | [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) | 33.01 | 29.40 | 33.00 | 28.55 | 41.10 |
173
+ | [Bloom-560m](https://huggingface.co/bigscience/bloom-560m)* | 32.13 | 24.74 | 37.15 | 24.22 | 42.44 |
174
+ | [Xglm-564M](https://huggingface.co/facebook/xglm-564M) | 31.97 | 25.56 | 34.64* | 25.18* | 42.53 |
175
  | [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) | 31.16 | 26.15 | 29.29 | 28.11 | 41.12 |
176
  | [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)* | 31.16 | 24.06 | 31.39 | 24.86 | 44.34 |
177
  | [OPT-125m](https://huggingface.co/facebook/opt-125m)* | 30.80 | 22.87 | 31.47 | 26.02 | 42.87 |
178
  | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
179
  | [Gpt2-small](https://huggingface.co/gpt2)* | 29.97 | 21.48 | 31.60 | 25.79 | 40.65 |
 
 
180
  | [Multilingual GPT](https://huggingface.co/ai-forever/mGPT)* | 28.73 | 23.81 | 26.37 | 25.17 | 39.62 |
181
 
182
+ - Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were extracted from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
183
 
184
  ## Fine-Tuning Comparisons
185