Spaces:

ludwigstumpp
/

llm-leaderboard

Runtime error

App Files Files Community

Ludwig Stumpp commited on May 18, 2023

Commit

15b03fa

•

1 Parent(s): 265c39e

Add text-davinci-003 results on HellaSwag and WinoGrande zero-shot

Browse files

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -30,7 +30,7 @@ https://huggingface.co/spaces/ludwigstumpp/llm-leaderboard
 | [gal-120b](https://arxiv.org/abs/2211.09085v1)                                                              | Meta AI             | no    |                                                  |                                                                      |                                                                    |                                                                 |                                                                                 |                                               |                                                                 | [0.526](https://paperswithcode.com/paper/galactica-a-large-language-model-for-science-1) |                                                                      |                                               |                                                                 |                                                                    |                                                                 |                                                                 |
 | [gpt-3-7b / curie](https://arxiv.org/abs/2005.14165)                                                        | OpenAI              | no    |                                                  | [0.682](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                                                    |                                                                 |                                                                                 |                                               |                                                                 |                                                                                          | [0.243](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 |                                                                    |                                                                 |                                                                 |
 | [gpt-3-175b / davinci](https://arxiv.org/abs/2005.14165)                                                    | OpenAI              | no    |                                                  | [0.793](https://arxiv.org/abs/2005.14165)                            | [0.789](https://arxiv.org/abs/2005.14165)                          |                                                                 |                                                                                 |                                               |                                                                 |                                                                                          | [0.439](https://arxiv.org/abs/2005.14165)                            |                                               |                                                                 | [0.702](https://arxiv.org/abs/2005.14165v4)                        |                                                                 |                                                                 |
-| [gpt-3.5-175b / text-davinci-003](https://arxiv.org/abs/2303.08774v3)                                       | OpenAI              | no    |                                                  | [0.822](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                                                    |                                                                 | [0.481](https://arxiv.org/abs/2303.08774v3)                                     | [0.762](https://arxiv.org/abs/2303.08774v3)   |                                                                 |                                                                                          | [0.569](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 |                                                                    |                                                                 | [0.816](https://arxiv.org/abs/2303.08774v3)                     |
 | [gpt-3.5-175b / code-davinci-002](https://platform.openai.com/docs/model-index-for-researchers)             | OpenAI              | no    |                                                  |                                                                      |                                                                    |                                                                 | [0.463](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations)      |                                               |                                                                 |                                                                                          |                                                                      |                                               |                                                                 |                                                                    |                                                                 |                                                                 |
 | [gpt-4](https://arxiv.org/abs/2303.08774v3)                                                                 | OpenAI              | no    |                                                  | [0.953](https://arxiv.org/abs/2303.08774v3)                          |                                                                    |                                                                 | [0.670](https://arxiv.org/abs/2303.08774v3)                                     |                                               |                                                                 |                                                                                          | [0.864](https://arxiv.org/abs/2303.08774v3)                          |                                               |                                                                 |                                                                    |                                                                 | [0.875](https://arxiv.org/abs/2303.08774v3)                     |
 | [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b)                                              | EleutherAI          | yes   |                                                  | [0.718](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.719](https://www.mosaicml.com/blog/mpt-7b)                      |                                                                 |                                                                                 | [0.719](https://www.mosaicml.com/blog/mpt-7b) |                                                                 | [0.269](https://www.mosaicml.com/blog/mpt-7b)                                            | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.347](https://www.mosaicml.com/blog/mpt-7b) |                                                                 |                                                                    |                                                                 |                                                                 |

 | [gal-120b](https://arxiv.org/abs/2211.09085v1)                                                              | Meta AI             | no    |                                                  |                                                                      |                                                                    |                                                                 |                                                                                 |                                               |                                                                 | [0.526](https://paperswithcode.com/paper/galactica-a-large-language-model-for-science-1) |                                                                      |                                               |                                                                 |                                                                    |                                                                 |                                                                 |
 | [gpt-3-7b / curie](https://arxiv.org/abs/2005.14165)                                                        | OpenAI              | no    |                                                  | [0.682](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                                                    |                                                                 |                                                                                 |                                               |                                                                 |                                                                                          | [0.243](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 |                                                                    |                                                                 |                                                                 |
 | [gpt-3-175b / davinci](https://arxiv.org/abs/2005.14165)                                                    | OpenAI              | no    |                                                  | [0.793](https://arxiv.org/abs/2005.14165)                            | [0.789](https://arxiv.org/abs/2005.14165)                          |                                                                 |                                                                                 |                                               |                                                                 |                                                                                          | [0.439](https://arxiv.org/abs/2005.14165)                            |                                               |                                                                 | [0.702](https://arxiv.org/abs/2005.14165v4)                        |                                                                 |                                                                 |
+| [gpt-3.5-175b / text-davinci-003](https://arxiv.org/abs/2303.08774v3)                                       | OpenAI              | no    |                                                  | [0.822](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.834](https://gpt4all.io/reports/GPT4All_Technical_Report_3.pdf) |                                                                 | [0.481](https://arxiv.org/abs/2303.08774v3)                                     | [0.762](https://arxiv.org/abs/2303.08774v3)   |                                                                 |                                                                                          | [0.569](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 | [0.758](https://gpt4all.io/reports/GPT4All_Technical_Report_3.pdf) |                                                                 | [0.816](https://arxiv.org/abs/2303.08774v3)                     |
 | [gpt-3.5-175b / code-davinci-002](https://platform.openai.com/docs/model-index-for-researchers)             | OpenAI              | no    |                                                  |                                                                      |                                                                    |                                                                 | [0.463](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations)      |                                               |                                                                 |                                                                                          |                                                                      |                                               |                                                                 |                                                                    |                                                                 |                                                                 |
 | [gpt-4](https://arxiv.org/abs/2303.08774v3)                                                                 | OpenAI              | no    |                                                  | [0.953](https://arxiv.org/abs/2303.08774v3)                          |                                                                    |                                                                 | [0.670](https://arxiv.org/abs/2303.08774v3)                                     |                                               |                                                                 |                                                                                          | [0.864](https://arxiv.org/abs/2303.08774v3)                          |                                               |                                                                 |                                                                    |                                                                 | [0.875](https://arxiv.org/abs/2303.08774v3)                     |
 | [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b)                                              | EleutherAI          | yes   |                                                  | [0.718](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.719](https://www.mosaicml.com/blog/mpt-7b)                      |                                                                 |                                                                                 | [0.719](https://www.mosaicml.com/blog/mpt-7b) |                                                                 | [0.269](https://www.mosaicml.com/blog/mpt-7b)                                            | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.347](https://www.mosaicml.com/blog/mpt-7b) |                                                                 |                                                                    |                                                                 |                                                                 |