XinYuan
Collection
3 items
•
Updated
The main purpose of this model is to validate the usability of thomas-yanxin/MT-SFT-ShareGPT, i.e., the quality of the data is all you need. We found that when we meticulously extract the data through a better data governance approach, the corresponding model results can be vastly improved, even if only through SFT.
Here are the results from our OpenCompass evaluation:
Classification | Benchmarks | Models |
---|---|---|
名称 | XinYuan-Qwen2-7B | |
English | MMLU | 68.71 |
MMLU-Pro | 30.56 | |
Theorem QA | 25.3 | |
GPQA | 29.2 | |
BBH | 60.3 | |
IFEval (Prompt Strict-Acc.) | 39.2 | |
ARC-C | 87.5 | |
Math | GSM8K | 75.4 |
MATH | 34.76 | |
Chinese | C-EVAL | 82.0 |
CMMLU | 77.9 | |
Code | MBPP | 50.6 |
HumanEval | 70.1 |