Text Generation
Transformers
Safetensors
bailing_moe
conversational
custom_code
zzqsmall commited on
Commit
f33e0e0
·
verified ·
1 Parent(s): 96513ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -105,8 +105,8 @@ It currently stands as the **best open-source flagship non-thinking model**, riv
105
  | | mbpp | 90.69 | 89.96 | **91.72** | 91.01 | __<span style="color:red">96.87</span>__ |
106
  | | LiveCodeBench (2408-2505) | 48.02 | 48.95 | **48.57** | 45.43 | __<span style="color:red">61.68</span>__ |
107
  | | CodeForces-rating | 1582 | 1574 | 1120 | **1675** | __<span style="color:red">1901</span>__ |
108
- | **Coding** | **Software Development** | | | | | |
109
  | | BIRD_SQL | 44.88 | 46.45 | 43.97 | __<span style="color:red">54.76</span>__ | **52.38** |
 
110
  | | ArtifactsBench | 43.29 | 44.87 | 41.04 | __<span style="color:red">60.28</span>__ | **59.31** |
111
  | | FullStack Bench | **55.48** | 54.00 | 50.92 | 48.19 | __<span style="color:red">56.55</span>__ |
112
  | | Aider | **88.16** | 85.34 | 84.40 | __<span style="color:red">89.85</span>__ | 83.65 |
@@ -121,7 +121,7 @@ It currently stands as the **best open-source flagship non-thinking model**, riv
121
  | | OptMATH | 35.99 | 35.84 | 39.16 | **42.77** | __<span style="color:red">57.68</span>__ |
122
  | **General Reasoning** | | | | | | |
123
  | | BBEH | **42.86** | 34.83 | 39.75 | 29.08 | __<span style="color:red">47.34</span>__ |
124
- | | KOR-Bench | 73.76 | 73.20 | 70.56 | 59.68 | __<span style="color:red">76.00</span>__ |
125
  | | ARC-AGI-1 | 14.69 | **22.19** | 14.06 | 18.94 | __<span style="color:red">43.81</span>__ |
126
  | | ZebraLogic | 81.6 | **85.5** | 57.3 | 70.2 | __<span style="color:red">90.8</span>__ |
127
  | **Agent** | | | | | | |
 
105
  | | mbpp | 90.69 | 89.96 | **91.72** | 91.01 | __<span style="color:red">96.87</span>__ |
106
  | | LiveCodeBench (2408-2505) | 48.02 | 48.95 | **48.57** | 45.43 | __<span style="color:red">61.68</span>__ |
107
  | | CodeForces-rating | 1582 | 1574 | 1120 | **1675** | __<span style="color:red">1901</span>__ |
 
108
  | | BIRD_SQL | 44.88 | 46.45 | 43.97 | __<span style="color:red">54.76</span>__ | **52.38** |
109
+ | **Coding** | **Software Development** | | | | | |
110
  | | ArtifactsBench | 43.29 | 44.87 | 41.04 | __<span style="color:red">60.28</span>__ | **59.31** |
111
  | | FullStack Bench | **55.48** | 54.00 | 50.92 | 48.19 | __<span style="color:red">56.55</span>__ |
112
  | | Aider | **88.16** | 85.34 | 84.40 | __<span style="color:red">89.85</span>__ | 83.65 |
 
121
  | | OptMATH | 35.99 | 35.84 | 39.16 | **42.77** | __<span style="color:red">57.68</span>__ |
122
  | **General Reasoning** | | | | | | |
123
  | | BBEH | **42.86** | 34.83 | 39.75 | 29.08 | __<span style="color:red">47.34</span>__ |
124
+ | | KOR-Bench | **73.76** | 73.20 | 70.56 | 59.68 | __<span style="color:red">76.00</span>__ |
125
  | | ARC-AGI-1 | 14.69 | **22.19** | 14.06 | 18.94 | __<span style="color:red">43.81</span>__ |
126
  | | ZebraLogic | 81.6 | **85.5** | 57.3 | 70.2 | __<span style="color:red">90.8</span>__ |
127
  | **Agent** | | | | | | |