File size: 13,824 Bytes
1e59da5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9fd0a82
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
license: apache-2.0
inference: false
---

# Description
4 bit quantization of [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) using GPTQ. We use the config below for quantization/evaluation and [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) as the calibration data. The code is available under [this repository.](https://github.com/IST-DASLab/marlin/tree/2f6d7c10e124b3c5fa29ff8d77d568bd7af3274c/gptq)

```yaml
bits: 4
damp_percent: 0.01
desc_act: true
exllama_config:
 version: 2
group_size: 128
quant_method: gptq
static_groups: false
sym: true
true_sequential: true
```

## Evaluations

Below is a comprehensive evaluation using the awesome [mosaicml/llm-foundry](https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval).

| model_name                        |   core_average |   world_knowledge |   commonsense_reasoning |   language_understanding |   symbolic_problem_solving |   reading_comprehension |
|:----------------------------------|---------------:|------------------:|------------------------:|-------------------------:|---------------------------:|------------------------:|
| upstage/SOLAR-10.7B-Instruct-v1.0 |       0.594131 |          0.602579 |                0.600195 |                 0.747605 |                   0.406245 |                0.614029 |

| Category                 | Benchmark                    | Subtask                             | Accuracy | Number few shot |
| :----------------------- | :--------------------------- | :---------------------------------- | -------: | :-------------- |
| symbolic_problem_solving | gsm8k                        |                                     | 0.638362 | 0-shot          |
| commonsense_reasoning    | copa                         |                                     |     0.84 | 0-shot          |
| commonsense_reasoning    | commonsense_qa               |                                     | 0.841933 | 0-shot          |
| commonsense_reasoning    | piqa                         |                                     | 0.818281 | 0-shot          |
| commonsense_reasoning    | bigbench_strange_stories     |                                     | 0.793103 | 0-shot          |
| commonsense_reasoning    | bigbench_strategy_qa         |                                     |  0.66623 | 0-shot          |
| language_understanding   | lambada_openai               |                                     | 0.735882 | 0-shot          |
| language_understanding   | hellaswag                    |                                     | 0.855208 | 0-shot          |
| reading_comprehension    | coqa                         |                                     | 0.222723 | 0-shot          |
| reading_comprehension    | boolq                        |                                     | 0.893884 | 0-shot          |
| world_knowledge          | triviaqa_sm_sub              |                                     | 0.628333 | 3-shot          |
| world_knowledge          | jeopardy                     | Average                             | 0.500792 | 3-shot          |
| world_knowledge          |                              | american_history                    | 0.581114 | 3-shot          |
| world_knowledge          |                              | literature                          | 0.655102 | 3-shot          |
| world_knowledge          |                              | science                             | 0.371849 | 3-shot          |
| world_knowledge          |                              | word_origins                        | 0.271233 | 3-shot          |
| world_knowledge          |                              | world_history                       | 0.624665 | 3-shot          |
| world_knowledge          | bigbench_qa_wikidata         |                                     | 0.669209 | 3-shot          |
| world_knowledge          | arc_easy                     |                                     | 0.815657 | 3-shot          |
| world_knowledge          | arc_challenge                |                                     | 0.650171 | 3-shot          |
| commonsense_reasoning    | siqa                         |                                     | 0.881781 | 3-shot          |
| language_understanding   | winograd                     |                                     | 0.897436 | 3-shot          |
| symbolic_problem_solving | bigbench_operators           |                                     | 0.595238 | 3-shot          |
| reading_comprehension    | squad                        |                                     | 0.626395 | 3-shot          |
| symbolic_problem_solving | svamp                        |                                     | 0.603333 | 5-shot          |
| world_knowledge          | mmlu                         | Average                             | 0.647028 | 5-shot          |
| world_knowledge          |                              | abstract_algebra                    |     0.29 | 5-shot          |
| world_knowledge          |                              | anatomy                             | 0.577778 | 5-shot          |
| world_knowledge          |                              | astronomy                           | 0.710526 | 5-shot          |
| world_knowledge          |                              | business_ethics                     |     0.73 | 5-shot          |
| world_knowledge          |                              | clinical_knowledge                  | 0.701887 | 5-shot          |
| world_knowledge          |                              | college_biology                     | 0.729167 | 5-shot          |
| world_knowledge          |                              | college_chemistry                   |     0.39 | 5-shot          |
| world_knowledge          |                              | college_computer_science            |      0.5 | 5-shot          |
| world_knowledge          |                              | college_mathematics                 |     0.31 | 5-shot          |
| world_knowledge          |                              | college_medicine                    |  0.66474 | 5-shot          |
| world_knowledge          |                              | college_physics                     | 0.411765 | 5-shot          |
| world_knowledge          |                              | computer_security                   |     0.72 | 5-shot          |
| world_knowledge          |                              | conceptual_physics                  | 0.582979 | 5-shot          |
| world_knowledge          |                              | econometrics                        | 0.473684 | 5-shot          |
| world_knowledge          |                              | electrical_engineering              | 0.565517 | 5-shot          |
| world_knowledge          |                              | elementary_mathematics              | 0.470899 | 5-shot          |
| world_knowledge          |                              | formal_logic                        | 0.460317 | 5-shot          |
| world_knowledge          |                              | global_facts                        |     0.33 | 5-shot          |
| world_knowledge          |                              | high_school_biology                 | 0.770968 | 5-shot          |
| world_knowledge          |                              | high_school_chemistry               | 0.448276 | 5-shot          |
| world_knowledge          |                              | high_school_computer_science        |     0.71 | 5-shot          |
| world_knowledge          |                              | high_school_european_history        | 0.830303 | 5-shot          |
| world_knowledge          |                              | high_school_geography               | 0.848485 | 5-shot          |
| world_knowledge          |                              | high_school_government_and_politics | 0.896373 | 5-shot          |
| world_knowledge          |                              | high_school_macroeconomics          | 0.646154 | 5-shot          |
| world_knowledge          |                              | high_school_mathematics             | 0.348148 | 5-shot          |
| world_knowledge          |                              | high_school_microeconomics          | 0.722689 | 5-shot          |
| world_knowledge          |                              | high_school_physics                 | 0.344371 | 5-shot          |
| world_knowledge          |                              | high_school_psychology              | 0.833028 | 5-shot          |
| world_knowledge          |                              | high_school_statistics              | 0.523148 | 5-shot          |
| world_knowledge          |                              | high_school_us_history              | 0.852941 | 5-shot          |
| world_knowledge          |                              | high_school_world_history           | 0.827004 | 5-shot          |
| world_knowledge          |                              | human_aging                         | 0.713004 | 5-shot          |
| world_knowledge          |                              | human_sexuality                     | 0.755725 | 5-shot          |
| world_knowledge          |                              | international_law                   | 0.768595 | 5-shot          |
| world_knowledge          |                              | jurisprudence                       | 0.796296 | 5-shot          |
| world_knowledge          |                              | logical_fallacies                   | 0.723926 | 5-shot          |
| world_knowledge          |                              | machine_learning                    | 0.508929 | 5-shot          |
| world_knowledge          |                              | management                          | 0.825243 | 5-shot          |
| world_knowledge          |                              | marketing                           | 0.871795 | 5-shot          |
| world_knowledge          |                              | medical_genetics                    |     0.73 | 5-shot          |
| world_knowledge          |                              | miscellaneous                       | 0.814815 | 5-shot          |
| world_knowledge          |                              | moral_disputes                      | 0.736994 | 5-shot          |
| world_knowledge          |                              | moral_scenarios                     |  0.43352 | 5-shot          |
| world_knowledge          |                              | nutrition                           | 0.728758 | 5-shot          |
| world_knowledge          |                              | philosophy                          | 0.700965 | 5-shot          |
| world_knowledge          |                              | prehistory                          | 0.765432 | 5-shot          |
| world_knowledge          |                              | professional_accounting             | 0.507092 | 5-shot          |
| world_knowledge          |                              | professional_law                    | 0.487614 | 5-shot          |
| world_knowledge          |                              | professional_medicine               | 0.727941 | 5-shot          |
| world_knowledge          |                              | professional_psychology             | 0.661765 | 5-shot          |
| world_knowledge          |                              | public_relations                    | 0.718182 | 5-shot          |
| world_knowledge          |                              | security_studies                    | 0.669388 | 5-shot          |
| world_knowledge          |                              | sociology                           |  0.81592 | 5-shot          |
| world_knowledge          |                              | us_foreign_policy                   |     0.89 | 5-shot          |
| world_knowledge          |                              | virology                            | 0.518072 | 5-shot          |
| world_knowledge          |                              | world_religions                     | 0.789474 | 5-shot          |
| symbolic_problem_solving | bigbench_dyck_languages      |                                     |    0.458 | 5-shot          |
| language_understanding   | winogrande                   |                                     | 0.826361 | 5-shot          |
| symbolic_problem_solving | agi_eval_lsat_ar             |                                     | 0.269565 | 5-shot          |
| symbolic_problem_solving | simple_arithmetic_nospaces   |                                     |    0.372 | 5-shot          |
| symbolic_problem_solving | simple_arithmetic_withspaces |                                     |    0.367 | 5-shot          |
| reading_comprehension    | agi_eval_lsat_rc             |                                     | 0.794776 | 5-shot          |
| reading_comprehension    | agi_eval_lsat_lr             |                                     | 0.641176 | 5-shot          |
| reading_comprehension    | agi_eval_sat_en              |                                     | 0.849515 | 5-shot          |
| world_knowledge          | arc_challenge                |                                     | 0.670648 | 25-shot         |
| commonsense_reasoning    | openbook_qa                  |                                     |     0.56 | 10-shot         |
| language_understanding   | hellaswag                    |                                     | 0.866461 | 10-shot         |
|                          | bigbench_cs_algorithms       |                                     | 0.652273 | 10-shot         |
| symbolic_problem_solving | bigbench_elementary_math_qa  |                                     | 0.392453 | 1-shot          |