Ludwig Stumpp commited on
Commit
15b03fa
1 Parent(s): 265c39e

Add text-davinci-003 results on HellaSwag and WinoGrande zero-shot

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -30,7 +30,7 @@ https://huggingface.co/spaces/ludwigstumpp/llm-leaderboard
30
  | [gal-120b](https://arxiv.org/abs/2211.09085v1) | Meta AI | no | | | | | | | | [0.526](https://paperswithcode.com/paper/galactica-a-large-language-model-for-science-1) | | | | | | |
31
  | [gpt-3-7b / curie](https://arxiv.org/abs/2005.14165) | OpenAI | no | | [0.682](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | | | [0.243](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | |
32
  | [gpt-3-175b / davinci](https://arxiv.org/abs/2005.14165) | OpenAI | no | | [0.793](https://arxiv.org/abs/2005.14165) | [0.789](https://arxiv.org/abs/2005.14165) | | | | | | [0.439](https://arxiv.org/abs/2005.14165) | | | [0.702](https://arxiv.org/abs/2005.14165v4) | | |
33
- | [gpt-3.5-175b / text-davinci-003](https://arxiv.org/abs/2303.08774v3) | OpenAI | no | | [0.822](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | [0.481](https://arxiv.org/abs/2303.08774v3) | [0.762](https://arxiv.org/abs/2303.08774v3) | | | [0.569](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | [0.816](https://arxiv.org/abs/2303.08774v3) |
34
  | [gpt-3.5-175b / code-davinci-002](https://platform.openai.com/docs/model-index-for-researchers) | OpenAI | no | | | | | [0.463](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations) | | | | | | | | | |
35
  | [gpt-4](https://arxiv.org/abs/2303.08774v3) | OpenAI | no | | [0.953](https://arxiv.org/abs/2303.08774v3) | | | [0.670](https://arxiv.org/abs/2303.08774v3) | | | | [0.864](https://arxiv.org/abs/2303.08774v3) | | | | | [0.875](https://arxiv.org/abs/2303.08774v3) |
36
  | [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | EleutherAI | yes | | [0.718](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.719](https://www.mosaicml.com/blog/mpt-7b) | | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | | [0.269](https://www.mosaicml.com/blog/mpt-7b) | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.347](https://www.mosaicml.com/blog/mpt-7b) | | | | |
 
30
  | [gal-120b](https://arxiv.org/abs/2211.09085v1) | Meta AI | no | | | | | | | | [0.526](https://paperswithcode.com/paper/galactica-a-large-language-model-for-science-1) | | | | | | |
31
  | [gpt-3-7b / curie](https://arxiv.org/abs/2005.14165) | OpenAI | no | | [0.682](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | | | [0.243](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | |
32
  | [gpt-3-175b / davinci](https://arxiv.org/abs/2005.14165) | OpenAI | no | | [0.793](https://arxiv.org/abs/2005.14165) | [0.789](https://arxiv.org/abs/2005.14165) | | | | | | [0.439](https://arxiv.org/abs/2005.14165) | | | [0.702](https://arxiv.org/abs/2005.14165v4) | | |
33
+ | [gpt-3.5-175b / text-davinci-003](https://arxiv.org/abs/2303.08774v3) | OpenAI | no | | [0.822](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.834](https://gpt4all.io/reports/GPT4All_Technical_Report_3.pdf) | | [0.481](https://arxiv.org/abs/2303.08774v3) | [0.762](https://arxiv.org/abs/2303.08774v3) | | | [0.569](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | [0.758](https://gpt4all.io/reports/GPT4All_Technical_Report_3.pdf) | | [0.816](https://arxiv.org/abs/2303.08774v3) |
34
  | [gpt-3.5-175b / code-davinci-002](https://platform.openai.com/docs/model-index-for-researchers) | OpenAI | no | | | | | [0.463](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations) | | | | | | | | | |
35
  | [gpt-4](https://arxiv.org/abs/2303.08774v3) | OpenAI | no | | [0.953](https://arxiv.org/abs/2303.08774v3) | | | [0.670](https://arxiv.org/abs/2303.08774v3) | | | | [0.864](https://arxiv.org/abs/2303.08774v3) | | | | | [0.875](https://arxiv.org/abs/2303.08774v3) |
36
  | [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | EleutherAI | yes | | [0.718](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.719](https://www.mosaicml.com/blog/mpt-7b) | | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | | [0.269](https://www.mosaicml.com/blog/mpt-7b) | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.347](https://www.mosaicml.com/blog/mpt-7b) | | | | |