neuralmagic-ent
/

Qwen2.5-7B-Instruct-quantized.w8a8

Text Generation

8-bit precision

compressed-tensors

Model card Files Files and versions Community

nm-research commited on 23 days ago

Commit

4329992

•

1 Parent(s): 11dac13

Update README.md

Files changed (1) hide show

README.md +21 -21

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ tags:
 - **Model Developers:** Neural Magic
 Quantized version of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
-It achieves an average score of 58.80 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark version 1 and 35.60 on version 2, whereas the unquantized model achieves 57.50 on version 1 and 35.85 on version 2.
 ### Model Optimizations
@@ -97,39 +97,39 @@ lm_eval \
    </td>
    <td>MMLU (5-shot)
    </td>
-   <td>66.27
    </td>
-   <td>65.61
    </td>
-   <td>99.0%
    </td>
   </tr>
   <tr>
    <td>ARC Challenge (25-shot)
    </td>
-   <td>56.91
    </td>
-   <td>57.25
    </td>
-   <td>100.6%
    </td>
   </tr>
   <tr>
    <td>GSM-8K (5-shot, strict-match)
    </td>
-   <td>17.29
    </td>
-   <td>28.13
    </td>
-   <td>162.7%
    </td>
   </tr>
   <tr>
    <td>Hellaswag (10-shot)
    </td>
-   <td>75.19
    </td>
-   <td>74.76
    </td>
    <td>99.4%
    </td>
@@ -137,31 +137,31 @@ lm_eval \
   <tr>
    <td>Winogrande (5-shot)
    </td>
-   <td>70.48
    </td>
-   <td>69.30
    </td>
-   <td>98.3%
    </td>
   </tr>
   <tr>
    <td>TruthfulQA (0-shot, mc2)
    </td>
-   <td>58.84
    </td>
-   <td>57.73
    </td>
-   <td>101.0%
    </td>
   </tr>
   <tr>
    <td><strong>Average</strong>
    </td>
-   <td><strong>57.50</strong>
    </td>
-   <td><strong>58.80</strong>
    </td>
-   <td><strong>102.3%</strong>
    </td>
   </tr>
   <tr>

 - **Model Developers:** Neural Magic
 Quantized version of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
+It achieves an average score of 73.05 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark version 1 and 41.44 on version 2, whereas the unquantized model achieves 73.16 on version 1 and 41.40 on version 2.
 ### Model Optimizations
    </td>
    <td>MMLU (5-shot)
    </td>
+   <td>74.24
    </td>
+   <td>73.84
    </td>
+   <td>99.5%
    </td>
   </tr>
   <tr>
    <td>ARC Challenge (25-shot)
    </td>
+   <td>63.40
    </td>
+   <td>63.23
    </td>
+   <td>99.7%
    </td>
   </tr>
   <tr>
    <td>GSM-8K (5-shot, strict-match)
    </td>
+   <td>80.36
    </td>
+   <td>80.74
    </td>
+   <td>100.5%
    </td>
   </tr>
   <tr>
    <td>Hellaswag (10-shot)
    </td>
+   <td>81.52
    </td>
+   <td>81.06
    </td>
    <td>99.4%
    </td>
   <tr>
    <td>Winogrande (5-shot)
    </td>
+   <td>74.66
    </td>
+   <td>74.82
    </td>
+   <td>100.2%
    </td>
   </tr>
   <tr>
    <td>TruthfulQA (0-shot, mc2)
    </td>
+   <td>64.76
    </td>
+   <td>64.58
    </td>
+   <td>99.7%
    </td>
   </tr>
   <tr>
    <td><strong>Average</strong>
    </td>
+   <td><strong>73.16</strong>
    </td>
+   <td><strong>73.05</strong>
    </td>
+   <td><strong>99.9%</strong>
    </td>
   </tr>
   <tr>