File size: 16,078 Bytes
46762df
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
license: llama3
inference: false
---

# Description
4 bit quantization of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using GPTQ. We use the config below for quantization/evaluation and [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) as the calibration data. The code is available under [this repository.](https://github.com/IST-DASLab/marlin/tree/2f6d7c10e124b3c5fa29ff8d77d568bd7af3274c/gptq)

```yaml
bits: 4
damp_percent: 0.01
desc_act: true
exllama_config:
 version: 2
group_size: 128
quant_method: gptq
static_groups: false
sym: true
true_sequential: true
```

## Evaluations

Below is a comprehensive evaluation and also comparison with [casperhansen/llama-3-8b-instruct-awq](https://huggingface.co/casperhansen/llama-3-8b-instruct-awq) using the awesome [mosaicml/llm-foundry](https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval).

| model_name                                | core_average | world_knowledge | commonsense_reasoning | language_understanding | symbolic_problem_solving | reading_comprehension |
| :---------------------------------------- | -----------: | --------------: | --------------------: | ---------------------: | -----------------------: | --------------------: |
| ISTA-DASLab/Llama-3-8B-Instruct-GPTQ-4bit |     0.552944 |        0.584061 |              0.547598 |               0.663904 |                 0.431017 |              0.538141 |
| casperhansen/llama-3-8b-instruct-awq      |     0.531504 |        0.557663 |              0.528201 |               0.657211 |                 0.391476 |              0.522971 |

| Category                 | Benchmark                    | Subtask                             | Accuracy GPTQ | Accuracy AWQ | Number few shot |
| :----------------------- | :--------------------------- | :---------------------------------- | ------------: | -----------: | :-------------- |
| symbolic_problem_solving | gsm8k                        |                                     |      0.721759 |      0.59818 | 0-shot          |
| commonsense_reasoning    | copa                         |                                     |          0.85 |         0.84 | 0-shot          |
| commonsense_reasoning    | commonsense_qa               |                                     |       0.78706 |     0.782146 | 0-shot          |
| commonsense_reasoning    | piqa                         |                                     |      0.784004 |     0.781828 | 0-shot          |
| commonsense_reasoning    | bigbench_strange_stories     |                                     |      0.764368 |     0.752874 | 0-shot          |
| commonsense_reasoning    | bigbench_strategy_qa         |                                     |      0.680647 |     0.659677 | 0-shot          |
| language_understanding   | lambada_openai               |                                     |      0.716476 |     0.717834 | 0-shot          |
| language_understanding   | hellaswag                    |                                     |      0.750647 |     0.753137 | 0-shot          |
| reading_comprehension    | coqa                         |                                     |      0.198797 |     0.109733 | 0-shot          |
| reading_comprehension    | boolq                        |                                     |        0.8263 |     0.836391 | 0-shot          |
| world_knowledge          | triviaqa_sm_sub              |                                     |      0.590667 |     0.511333 | 3-shot          |
| world_knowledge          | jeopardy                     | Average                             |        0.4975 |     0.489451 | 3-shot          |
| world_knowledge          |                              | american_history                    |      0.535109 |     0.544794 | 3-shot          |
| world_knowledge          |                              | literature                          |      0.622449 |     0.626531 | 3-shot          |
| world_knowledge          |                              | science                             |      0.420168 |     0.390756 | 3-shot          |
| world_knowledge          |                              | word_origins                        |      0.293151 |     0.271233 | 3-shot          |
| world_knowledge          |                              | world_history                       |      0.616622 |     0.613941 | 3-shot          |
| world_knowledge          | bigbench_qa_wikidata         |                                     |      0.684366 |     0.644358 | 3-shot          |
| world_knowledge          | arc_easy                     |                                     |      0.808923 |     0.808081 | 3-shot          |
| world_knowledge          | arc_challenge                |                                     |      0.571672 |     0.571672 | 3-shot          |
| commonsense_reasoning    | siqa                         |                                     |      0.827533 |     0.814227 | 3-shot          |
| language_understanding   | winograd                     |                                     |      0.871795 |     0.860806 | 3-shot          |
| symbolic_problem_solving | bigbench_operators           |                                     |      0.547619 |     0.552381 | 3-shot          |
| reading_comprehension    | squad                        |                                     |      0.581552 |      0.58789 | 3-shot          |
| symbolic_problem_solving | svamp                        |                                     |          0.68 |         0.57 | 5-shot          |
| world_knowledge          | mmlu                         | Average                             |      0.668279 |     0.645874 | 5-shot          |
| world_knowledge          |                              | abstract_algebra                    |          0.29 |         0.33 | 5-shot          |
| world_knowledge          |                              | anatomy                             |      0.681481 |     0.651852 | 5-shot          |
| world_knowledge          |                              | astronomy                           |      0.703947 |     0.671053 | 5-shot          |
| world_knowledge          |                              | business_ethics                     |          0.67 |         0.68 | 5-shot          |
| world_knowledge          |                              | clinical_knowledge                  |      0.750943 |     0.701887 | 5-shot          |
| world_knowledge          |                              | college_biology                     |      0.784722 |     0.729167 | 5-shot          |
| world_knowledge          |                              | college_chemistry                   |          0.47 |         0.46 | 5-shot          |
| world_knowledge          |                              | college_computer_science            |          0.56 |         0.54 | 5-shot          |
| world_knowledge          |                              | college_mathematics                 |          0.36 |         0.28 | 5-shot          |
| world_knowledge          |                              | college_medicine                    |      0.653179 |     0.635838 | 5-shot          |
| world_knowledge          |                              | college_physics                     |           0.5 |     0.431373 | 5-shot          |
| world_knowledge          |                              | computer_security                   |          0.78 |         0.75 | 5-shot          |
| world_knowledge          |                              | conceptual_physics                  |      0.548936 |     0.557447 | 5-shot          |
| world_knowledge          |                              | econometrics                        |       0.45614 |     0.482456 | 5-shot          |
| world_knowledge          |                              | electrical_engineering              |      0.668966 |     0.586207 | 5-shot          |
| world_knowledge          |                              | elementary_mathematics              |      0.439153 |     0.417989 | 5-shot          |
| world_knowledge          |                              | formal_logic                        |       0.47619 |     0.412698 | 5-shot          |
| world_knowledge          |                              | global_facts                        |          0.37 |         0.41 | 5-shot          |
| world_knowledge          |                              | high_school_biology                 |      0.790323 |     0.754839 | 5-shot          |
| world_knowledge          |                              | high_school_chemistry               |      0.581281 |     0.507389 | 5-shot          |
| world_knowledge          |                              | high_school_computer_science        |          0.71 |         0.74 | 5-shot          |
| world_knowledge          |                              | high_school_european_history        |      0.745455 |     0.775758 | 5-shot          |
| world_knowledge          |                              | high_school_geography               |      0.823232 |     0.823232 | 5-shot          |
| world_knowledge          |                              | high_school_government_and_politics |      0.917098 |     0.875648 | 5-shot          |
| world_knowledge          |                              | high_school_macroeconomics          |      0.635897 |     0.620513 | 5-shot          |
| world_knowledge          |                              | high_school_mathematics             |      0.407407 |     0.392593 | 5-shot          |
| world_knowledge          |                              | high_school_microeconomics          |      0.726891 |     0.714286 | 5-shot          |
| world_knowledge          |                              | high_school_physics                 |      0.423841 |     0.410596 | 5-shot          |
| world_knowledge          |                              | high_school_psychology              |      0.842202 |     0.838532 | 5-shot          |
| world_knowledge          |                              | high_school_statistics              |      0.592593 |     0.513889 | 5-shot          |
| world_knowledge          |                              | high_school_us_history              |      0.852941 |     0.852941 | 5-shot          |
| world_knowledge          |                              | high_school_world_history           |      0.843882 |     0.831224 | 5-shot          |
| world_knowledge          |                              | human_aging                         |      0.717489 |     0.713004 | 5-shot          |
| world_knowledge          |                              | human_sexuality                     |      0.763359 |      0.70229 | 5-shot          |
| world_knowledge          |                              | international_law                   |      0.793388 |      0.77686 | 5-shot          |
| world_knowledge          |                              | jurisprudence                       |      0.814815 |     0.768519 | 5-shot          |
| world_knowledge          |                              | logical_fallacies                   |      0.754601 |     0.773006 | 5-shot          |
| world_knowledge          |                              | machine_learning                    |      0.553571 |     0.508929 | 5-shot          |
| world_knowledge          |                              | management                          |       0.84466 |     0.834951 | 5-shot          |
| world_knowledge          |                              | marketing                           |       0.92735 |     0.888889 | 5-shot          |
| world_knowledge          |                              | medical_genetics                    |          0.81 |         0.78 | 5-shot          |
| world_knowledge          |                              | miscellaneous                       |      0.825032 |     0.799489 | 5-shot          |
| world_knowledge          |                              | moral_disputes                      |      0.739884 |     0.722543 | 5-shot          |
| world_knowledge          |                              | moral_scenarios                     |      0.437989 |      0.38324 | 5-shot          |
| world_knowledge          |                              | nutrition                           |      0.764706 |     0.735294 | 5-shot          |
| world_knowledge          |                              | philosophy                          |      0.733119 |     0.713826 | 5-shot          |
| world_knowledge          |                              | prehistory                          |      0.719136 |     0.719136 | 5-shot          |
| world_knowledge          |                              | professional_accounting             |      0.475177 |     0.485816 | 5-shot          |
| world_knowledge          |                              | professional_law                    |      0.480443 |     0.449153 | 5-shot          |
| world_knowledge          |                              | professional_medicine               |      0.709559 |     0.676471 | 5-shot          |
| world_knowledge          |                              | professional_psychology             |      0.694444 |     0.676471 | 5-shot          |
| world_knowledge          |                              | public_relations                    |           0.7 |          0.6 | 5-shot          |
| world_knowledge          |                              | security_studies                    |      0.730612 |     0.718367 | 5-shot          |
| world_knowledge          |                              | sociology                           |      0.830846 |     0.845771 | 5-shot          |
| world_knowledge          |                              | us_foreign_policy                   |          0.86 |         0.85 | 5-shot          |
| world_knowledge          |                              | virology                            |      0.542169 |     0.518072 | 5-shot          |
| world_knowledge          |                              | world_religions                     |      0.812865 |     0.795322 | 5-shot          |
| symbolic_problem_solving | bigbench_dyck_languages      |                                     |         0.086 |        0.045 | 5-shot          |
| language_understanding   | winogrande                   |                                     |      0.764009 |     0.759274 | 5-shot          |
| symbolic_problem_solving | agi_eval_lsat_ar             |                                     |           0.3 |     0.278261 | 5-shot          |
| symbolic_problem_solving | simple_arithmetic_nospaces   |                                     |         0.466 |        0.458 | 5-shot          |
| symbolic_problem_solving | simple_arithmetic_withspaces |                                     |         0.502 |        0.496 | 5-shot          |
| reading_comprehension    | agi_eval_lsat_rc             |                                     |      0.731343 |     0.708955 | 5-shot          |
| reading_comprehension    | agi_eval_lsat_lr             |                                     |      0.554902 |     0.560784 | 5-shot          |
| reading_comprehension    | agi_eval_sat_en              |                                     |       0.81068 |     0.805825 | 5-shot          |
| world_knowledge          | arc_challenge                |                                     |      0.582765 |     0.591297 | 25-shot         |
| commonsense_reasoning    | openbook_qa                  |                                     |         0.478 |        0.468 | 10-shot         |
| language_understanding   | hellaswag                    |                                     |      0.769468 |     0.771062 | 10-shot         |
|                          | bigbench_cs_algorithms       |                                     |      0.715151 |     0.687879 | 10-shot         |
| symbolic_problem_solving | bigbench_elementary_math_qa  |                                     |      0.533569 |     0.530922 | 1-shot          |