Update README.md
Browse files
README.md
CHANGED
|
@@ -97,7 +97,7 @@ It currently stands as the **best open-source flagship non-thinking model**, riv
|
|
| 97 |
| | MMLU-Redux (EM) | 92.37 | 91.58 | **92.75** | __<span style="color:red">94.67</span>__ | 92.25 |
|
| 98 |
| | MMLU-Pro | __<span style="color:red">83.25</span>__ | 81.03 | 81.94 | **82.13** | 82.04 |
|
| 99 |
| **Knowledge** | **STEM** | | | | | |
|
| 100 |
-
| | MMLU-Pro-Stem |
|
| 101 |
| | OlympiadBench-stem | 87.83 | 79.13 | 78.26 | **89.57** | __<span style="color:red">91.3</span>__ |
|
| 102 |
| | GPQA-Diamond | __<span style="color:red">76.23</span>__ | **73.93** | 71.31 | 71.81 | 72.98 |
|
| 103 |
| **Coding** | **Code Generation** | | | | | |
|
|
|
|
| 97 |
| | MMLU-Redux (EM) | 92.37 | 91.58 | **92.75** | __<span style="color:red">94.67</span>__ | 92.25 |
|
| 98 |
| | MMLU-Pro | __<span style="color:red">83.25</span>__ | 81.03 | 81.94 | **82.13** | 82.04 |
|
| 99 |
| **Knowledge** | **STEM** | | | | | |
|
| 100 |
+
| | MMLU-Pro-Stem | 87.91 | 85.30 | 73.45 | __<span style="color:red">88.60</span> | **88.5** |
|
| 101 |
| | OlympiadBench-stem | 87.83 | 79.13 | 78.26 | **89.57** | __<span style="color:red">91.3</span>__ |
|
| 102 |
| | GPQA-Diamond | __<span style="color:red">76.23</span>__ | **73.93** | 71.31 | 71.81 | 72.98 |
|
| 103 |
| **Coding** | **Code Generation** | | | | | |
|