Spaces:
Running
Running
Clémentine
commited on
Commit
•
6fa8358
1
Parent(s):
ae4f37a
rephrase
Browse files- src/index.html +1 -6
src/index.html
CHANGED
@@ -49,11 +49,6 @@
|
|
49 |
</d-front-matter>
|
50 |
<d-title>
|
51 |
<h1 class="l-page" style="text-align: center;">Open-LLM performances are plateauing, let’s make it steep again </h1>
|
52 |
-
<div id="title-plot" class="l-body l-screen">
|
53 |
-
<figure>
|
54 |
-
<img src="assets/images/banner.png" alt="Banner">
|
55 |
-
</figure>
|
56 |
-
</div>
|
57 |
</d-title>
|
58 |
<d-byline></d-byline>
|
59 |
<d-article>
|
@@ -216,7 +211,7 @@
|
|
216 |
|
217 |
<h3>What do the rankings look like?</h3>
|
218 |
|
219 |
-
<p>Taking a look at the top 10 models on the previous version of the Open LLM Leaderboard, and comparing with this updated version, some models appear to have a relatively stable ranking (in bold below): Qwen-2-72B instruct, Meta’s Llama3-70B
|
220 |
<p>We’ve been particularly impressed by Qwen2-72B-Instruct, one step above other models (notably thanks to its performance in math, long range reasoning, and knowledge)</p>
|
221 |
<p>The current second best model, Llama-3-70B-Instruct, interestingly loses 15 points to its pretrained version counterpart on GPQA, which begs the question whether the particularly extensive instruction fine-tuning done by the Meta team on this model affected some expert/graduate level knowledge.</p>
|
222 |
<p>Also very interesting is the fact that a new challenger climbed the ranks to reach 3rd place despite its smaller size. With only 13B parameters, Microsoft’s Phi-3-medium-4K-instruct model shows a performance equivalent to models 2 to 4 times its size. It would be very interesting to have more information on the training procedure for Phi or an independant reproduction from an external team with open training recipes/datasets.</p>
|
|
|
49 |
</d-front-matter>
|
50 |
<d-title>
|
51 |
<h1 class="l-page" style="text-align: center;">Open-LLM performances are plateauing, let’s make it steep again </h1>
|
|
|
|
|
|
|
|
|
|
|
52 |
</d-title>
|
53 |
<d-byline></d-byline>
|
54 |
<d-article>
|
|
|
211 |
|
212 |
<h3>What do the rankings look like?</h3>
|
213 |
|
214 |
+
<p>Taking a look at the top 10 models on the previous version of the Open LLM Leaderboard, and comparing with this updated version, some models appear to have a relatively stable ranking (in bold below): Qwen-2-72B instruct, Meta’s Llama3-70B instruct, 01-ai’s Yi-1.5-34B chat, Cohere’s Command R + model, and lastly Smaug-72B, from AbacusAI.</p>
|
215 |
<p>We’ve been particularly impressed by Qwen2-72B-Instruct, one step above other models (notably thanks to its performance in math, long range reasoning, and knowledge)</p>
|
216 |
<p>The current second best model, Llama-3-70B-Instruct, interestingly loses 15 points to its pretrained version counterpart on GPQA, which begs the question whether the particularly extensive instruction fine-tuning done by the Meta team on this model affected some expert/graduate level knowledge.</p>
|
217 |
<p>Also very interesting is the fact that a new challenger climbed the ranks to reach 3rd place despite its smaller size. With only 13B parameters, Microsoft’s Phi-3-medium-4K-instruct model shows a performance equivalent to models 2 to 4 times its size. It would be very interesting to have more information on the training procedure for Phi or an independant reproduction from an external team with open training recipes/datasets.</p>
|