Ludwig Stumpp commited on
Commit
53be3b4
1 Parent(s): 3a7dc42

Add aditional LAMBADA entries

Browse files
Files changed (1) hide show
  1. README.md +37 -36
README.md CHANGED
@@ -22,42 +22,43 @@ We are always happy for contributions! You can contribute by the following:
22
 
23
  ## Leaderboard
24
 
25
- | Model Name | Chatbot Arena Elo | HumanEval-Python (pass@1) | LAMBADA (zero-shot) | TriviaQA (zero-shot) |
26
- | ------------------------------------------------------------------------------------------- | ------------------------------------------------ | ------------------------------------------------------------------------------ | --------------------------------------------- | --------------------------------------------- |
27
- | [alpaca-13b](https://crfm.stanford.edu/2023/03/13/alpaca.html) | [1008](https://lmsys.org/blog/2023-05-03-arena/) | | | |
28
- | [cerebras-gpt-7b](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) | | | [0.636](https://www.mosaicml.com/blog/mpt-7b) | [0.141](https://www.mosaicml.com/blog/mpt-7b) |
29
- | [cerebras-gpt-13b](https://huggingface.co/cerebras/Cerebras-GPT-13B) | | | [0.635](https://www.mosaicml.com/blog/mpt-7b) | [0.146](https://www.mosaicml.com/blog/mpt-7b) |
30
- | [chatglm-6b](https://chatglm.cn/blog) | [985](https://lmsys.org/blog/2023-05-03-arena/) | | | |
31
- | [code-cushman-001](https://arxiv.org/abs/2107.03374) | | [33.5](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
32
- | [code-davinci-002](https://arxiv.org/abs/2207.10397v2) | | [65.8](https://arxiv.org/abs/2207.10397v2) | | |
33
- | [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) | | [29.3](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
34
- | [codegen-16B-multi](https://huggingface.co/Salesforce/codegen-16B-multi) | | [18.3](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
35
- | [codegx-13b](http://keg.cs.tsinghua.edu.cn/codegeex/) | | [22.9](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
36
- | [codex-12b](https://arxiv.org/abs/2107.03374v2) | | [28.81](https://arxiv.org/abs/2107.03374v2) | | |
37
- | [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) | [944](https://lmsys.org/blog/2023-05-03-arena/) | | | |
38
- | [eleuther-pythia-7b](https://huggingface.co/EleutherAI/pythia-6.9b) | | | [0.667](https://www.mosaicml.com/blog/mpt-7b) | [0.198](https://www.mosaicml.com/blog/mpt-7b) |
39
- | [eleuther-pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) | | | [0.704](https://www.mosaicml.com/blog/mpt-7b) | [0.233](https://www.mosaicml.com/blog/mpt-7b) |
40
- | [fastchat-t5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | [951](https://lmsys.org/blog/2023-05-03-arena/) | | | |
41
- | [gpt-3.5-175b](https://arxiv.org/abs/2303.08774v3) | | [48.1](https://arxiv.org/abs/2303.08774v3) | | |
42
- | [gpt-4](https://arxiv.org/abs/2303.08774v3) | | [67.0](https://arxiv.org/abs/2303.08774v3) | | |
43
- | [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | [0.347](https://www.mosaicml.com/blog/mpt-7b) |
44
- | [gptj-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | | | [0.683](https://www.mosaicml.com/blog/mpt-7b) | [0.234](https://www.mosaicml.com/blog/mpt-7b) |
45
- | [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | [1082](https://lmsys.org/blog/2023-05-03-arena/) | | | |
46
- | [llama-7b](https://arxiv.org/abs/2302.13971) | | [10.5](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.738](https://www.mosaicml.com/blog/mpt-7b) | [0.443](https://www.mosaicml.com/blog/mpt-7b) |
47
- | [llama-13b](https://arxiv.org/abs/2302.13971) | [932](https://lmsys.org/blog/2023-05-03-arena/) | [15.8](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
48
- | [llama-33b](https://arxiv.org/abs/2302.13971) | | [21.7](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
49
- | [llama-65b](https://arxiv.org/abs/2302.13971) | | [23.7](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
50
- | [mpt-7b](https://huggingface.co/mosaicml/mpt-7b) | | | [0.702](https://www.mosaicml.com/blog/mpt-7b) | [0.343](https://www.mosaicml.com/blog/mpt-7b) |
51
- | [oasst-pythia-12b](https://huggingface.co/OpenAssistant/pythia-12b-pre-v8-12.5k-steps) | [1065](https://lmsys.org/blog/2023-05-03-arena/) | | | |
52
- | [opt-7b](https://huggingface.co/facebook/opt-6.7b) | | | [0.677](https://www.mosaicml.com/blog/mpt-7b) | [0.227](https://www.mosaicml.com/blog/mpt-7b) |
53
- | [opt-13b](https://huggingface.co/facebook/opt-13b) | | | [0.692](https://www.mosaicml.com/blog/mpt-7b) | [0.282](https://www.mosaicml.com/blog/mpt-7b) |
54
- | [palm-540b](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html) | | [26.2](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
55
- | [stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) | | | [0.533](https://www.mosaicml.com/blog/mpt-7b) | [0.049](https://www.mosaicml.com/blog/mpt-7b) |
56
- | [stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | [858](https://lmsys.org/blog/2023-05-03-arena/) | | | |
57
- | [starcoder-base-16B](https://huggingface.co/bigcode/starcoderbase) | | [30.4](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
58
- | [starcoder-16B](https://huggingface.co/bigcode/starcoder) | | [33.6](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
59
- | [starcoder-16B (prompted)](https://huggingface.co/bigcode/starcoder) | | [40.8](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
60
- | [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | [1169](https://lmsys.org/blog/2023-05-03-arena/) | | | |
 
61
 
62
  ## Benchmarks
63
 
 
22
 
23
  ## Leaderboard
24
 
25
+ | Model Name | Chatbot Arena Elo | HumanEval-Python (pass@1) | LAMBADA (zero-shot) | TriviaQA (zero-shot) |
26
+ | -------------------------------------------------------------------------------------- | ------------------------------------------------ | ------------------------------------------------------------------------------ | --------------------------------------------- | --------------------------------------------- |
27
+ | [alpaca-13b](https://crfm.stanford.edu/2023/03/13/alpaca.html) | [1008](https://lmsys.org/blog/2023-05-03-arena/) | | | |
28
+ | [cerebras-gpt-7b](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) | | | [0.636](https://www.mosaicml.com/blog/mpt-7b) | [0.141](https://www.mosaicml.com/blog/mpt-7b) |
29
+ | [cerebras-gpt-13b](https://huggingface.co/cerebras/Cerebras-GPT-13B) | | | [0.635](https://www.mosaicml.com/blog/mpt-7b) | [0.146](https://www.mosaicml.com/blog/mpt-7b) |
30
+ | [chatglm-6b](https://chatglm.cn/blog) | [985](https://lmsys.org/blog/2023-05-03-arena/) | | | |
31
+ | [chinchilla-70b](https://arxiv.org/abs/2203.15556v1) | | | [0.774](https://arxiv.org/abs/2203.15556v1) | |
32
+ | [code-cushman-001](https://arxiv.org/abs/2107.03374) | | [33.5](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
33
+ | [code-davinci-002](https://arxiv.org/abs/2207.10397v2) | | [65.8](https://arxiv.org/abs/2207.10397v2) | | |
34
+ | [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) | | [29.3](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
35
+ | [codegen-16B-multi](https://huggingface.co/Salesforce/codegen-16B-multi) | | [18.3](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
36
+ | [codegx-13b](http://keg.cs.tsinghua.edu.cn/codegeex/) | | [22.9](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
37
+ | [codex-12b](https://arxiv.org/abs/2107.03374v2) | | [28.81](https://arxiv.org/abs/2107.03374v2) | | |
38
+ | [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) | [944](https://lmsys.org/blog/2023-05-03-arena/) | | | |
39
+ | [eleuther-pythia-7b](https://huggingface.co/EleutherAI/pythia-6.9b) | | | [0.667](https://www.mosaicml.com/blog/mpt-7b) | [0.198](https://www.mosaicml.com/blog/mpt-7b) |
40
+ | [eleuther-pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) | | | [0.704](https://www.mosaicml.com/blog/mpt-7b) | [0.233](https://www.mosaicml.com/blog/mpt-7b) |
41
+ | [fastchat-t5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | [951](https://lmsys.org/blog/2023-05-03-arena/) | | | |
42
+ | [gpt-3.5-175b](https://arxiv.org/abs/2303.08774v3) | | [48.1](https://arxiv.org/abs/2303.08774v3) | [0.762](https://arxiv.org/abs/2303.08774v3) | |
43
+ | [gpt-4](https://arxiv.org/abs/2303.08774v3) | | [67.0](https://arxiv.org/abs/2303.08774v3) | | |
44
+ | [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | [0.347](https://www.mosaicml.com/blog/mpt-7b) |
45
+ | [gptj-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | | | [0.683](https://www.mosaicml.com/blog/mpt-7b) | [0.234](https://www.mosaicml.com/blog/mpt-7b) |
46
+ | [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | [1082](https://lmsys.org/blog/2023-05-03-arena/) | | | |
47
+ | [llama-7b](https://arxiv.org/abs/2302.13971) | | [10.5](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.738](https://www.mosaicml.com/blog/mpt-7b) | [0.443](https://www.mosaicml.com/blog/mpt-7b) |
48
+ | [llama-13b](https://arxiv.org/abs/2302.13971) | [932](https://lmsys.org/blog/2023-05-03-arena/) | [15.8](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
49
+ | [llama-33b](https://arxiv.org/abs/2302.13971) | | [21.7](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
50
+ | [llama-65b](https://arxiv.org/abs/2302.13971) | | [23.7](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
51
+ | [mpt-7b](https://huggingface.co/mosaicml/mpt-7b) | | | [0.702](https://www.mosaicml.com/blog/mpt-7b) | [0.343](https://www.mosaicml.com/blog/mpt-7b) |
52
+ | [oasst-pythia-12b](https://huggingface.co/OpenAssistant/pythia-12b-pre-v8-12.5k-steps) | [1065](https://lmsys.org/blog/2023-05-03-arena/) | | | |
53
+ | [opt-7b](https://huggingface.co/facebook/opt-6.7b) | | | [0.677](https://www.mosaicml.com/blog/mpt-7b) | [0.227](https://www.mosaicml.com/blog/mpt-7b) |
54
+ | [opt-13b](https://huggingface.co/facebook/opt-13b) | | | [0.692](https://www.mosaicml.com/blog/mpt-7b) | [0.282](https://www.mosaicml.com/blog/mpt-7b) |
55
+ | [palm-540b](https://arxiv.org/abs/2204.02311v5) | | [26.2](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.779](https://arxiv.org/abs/2204.02311v5) | |
56
+ | [stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) | | | [0.533](https://www.mosaicml.com/blog/mpt-7b) | [0.049](https://www.mosaicml.com/blog/mpt-7b) |
57
+ | [stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | [858](https://lmsys.org/blog/2023-05-03-arena/) | | | |
58
+ | [starcoder-base-16B](https://huggingface.co/bigcode/starcoderbase) | | [30.4](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
59
+ | [starcoder-16B](https://huggingface.co/bigcode/starcoder) | | [33.6](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
60
+ | [starcoder-16B (prompted)](https://huggingface.co/bigcode/starcoder) | | [40.8](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | |
61
+ | [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | [1169](https://lmsys.org/blog/2023-05-03-arena/) | | | |
62
 
63
  ## Benchmarks
64