Text Generation
Transformers
Safetensors
mistral
openchat
C-RLFT
conversational
Inference Endpoints
text-generation-inference
imone commited on
Commit
08168b1
1 Parent(s): 92e6458

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -7
README.md CHANGED
@@ -229,15 +229,11 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
229
 
230
  *: Grok results are reported by [X.AI](https://x.ai/).
231
 
232
- <div>
233
- <h3>Massive Multitask Language Understanding in Chinese (CMMLU)</h3>
234
- 5-shot:
235
  </div>
236
 
237
- | Models | STEM | Humanities | SocialSciences | Other | ChinaSpecific | Avg |
238
- |----------|-------|------------|----------------|-------|---------------|-------|
239
- | ChatGPT | 47.81 | 55.68 | 56.5 | 62.66 | 50.69 | 55.51 |
240
- | OpenChat | 38.7 | 45.99 | 48.32 | 50.23 | 43.27 | 45.85 |
241
 
242
  <div>
243
  <h3>Multi-Level Multi-Discipline Chinese Evaluation Suite (CEVAL)</h3>
@@ -248,6 +244,14 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
248
  | ChatGPT | 54.4 | 52.9 | 61.8 | 50.9 | 53.6 |
249
  | OpenChat | 47.29 | 45.22 | 52.49 | 48.52 | 45.08 |
250
 
 
 
 
 
 
 
 
 
251
 
252
  <div align="center">
253
  <h2> Limitations </h2>
 
229
 
230
  *: Grok results are reported by [X.AI](https://x.ai/).
231
 
232
+ <div align="center">
233
+ <h2> 中文评估结果 / Chinese Evaluations </h2>
 
234
  </div>
235
 
236
+ ⚠️ Note that this model was not explicitly trained in Chinese (only < 0.1% of the data is in Chinese). 请注意本模型没有针对性训练中文(中文数据占比小于0.1%)。
 
 
 
237
 
238
  <div>
239
  <h3>Multi-Level Multi-Discipline Chinese Evaluation Suite (CEVAL)</h3>
 
244
  | ChatGPT | 54.4 | 52.9 | 61.8 | 50.9 | 53.6 |
245
  | OpenChat | 47.29 | 45.22 | 52.49 | 48.52 | 45.08 |
246
 
247
+ <div>
248
+ <h3>Massive Multitask Language Understanding in Chinese (CMMLU, 5-shot)</h3>
249
+ </div>
250
+
251
+ | Models | STEM | Humanities | SocialSciences | Other | ChinaSpecific | Avg |
252
+ |----------|-------|------------|----------------|-------|---------------|-------|
253
+ | ChatGPT | 47.81 | 55.68 | 56.5 | 62.66 | 50.69 | 55.51 |
254
+ | OpenChat | 38.7 | 45.99 | 48.32 | 50.23 | 43.27 | 45.85 |
255
 
256
  <div align="center">
257
  <h2> Limitations </h2>