amirm commited on
Commit
e1b03d0
·
verified ·
1 Parent(s): 283921c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md CHANGED
@@ -2,3 +2,128 @@
2
  license: llama3
3
  inference: false
4
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: llama3
3
  inference: false
4
  ---
5
+
6
+ # Description
7
+ 4 bit quantization of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using GPTQ. We use the config below for quantization/evaluation and [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)as the calibration data. The code is available under [this repository.](https://github.com/IST-DASLab/marlin/tree/2f6d7c10e124b3c5fa29ff8d77d568bd7af3274c/gptq)
8
+
9
+ ```yaml
10
+ bits: 4
11
+ damp_percent: 0.01
12
+ desc_act: true
13
+ exllama_config:
14
+ version: 2
15
+ group_size: 128
16
+ quant_method: gptq
17
+ static_groups: false
18
+ sym: true
19
+ true_sequential: true
20
+ ```
21
+
22
+ ## Evaluations
23
+
24
+ Below is a comprehensive evaluation and also comparison with [casperhansen/llama-3-8b-instruct-awq](https://huggingface.co/casperhansen/llama-3-8b-instruct-awq) using the awesome [mosaicml/llm-foundry](https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval).
25
+
26
+ | model_name | core_average | world_knowledge | commonsense_reasoning | language_understanding | symbolic_problem_solving | reading_comprehension |
27
+ | :---------------------------------------- | -----------: | --------------: | --------------------: | ---------------------: | -----------------------: | --------------------: |
28
+ | ISTA-DASLab/Llama-3-8B-Instruct-GPTQ-4bit | 0.552944 | 0.584061 | 0.547598 | 0.663904 | 0.431017 | 0.538141 |
29
+ | casperhansen/llama-3-8b-instruct-awq | 0.531504 | 0.557663 | 0.528201 | 0.657211 | 0.391476 | 0.522971 |
30
+
31
+ | Category | Benchmark | Subtask | Accuracy GPTQ | Accuracy AWQ | Number few shot |
32
+ | :----------------------- | :--------------------------- | :---------------------------------- | ------------: | -----------: | :-------------- |
33
+ | symbolic_problem_solving | gsm8k | | 0.721759 | 0.59818 | 0-shot |
34
+ | commonsense_reasoning | copa | | 0.85 | 0.84 | 0-shot |
35
+ | commonsense_reasoning | commonsense_qa | | 0.78706 | 0.782146 | 0-shot |
36
+ | commonsense_reasoning | piqa | | 0.784004 | 0.781828 | 0-shot |
37
+ | commonsense_reasoning | bigbench_strange_stories | | 0.764368 | 0.752874 | 0-shot |
38
+ | commonsense_reasoning | bigbench_strategy_qa | | 0.680647 | 0.659677 | 0-shot |
39
+ | language_understanding | lambada_openai | | 0.716476 | 0.717834 | 0-shot |
40
+ | language_understanding | hellaswag | | 0.750647 | 0.753137 | 0-shot |
41
+ | reading_comprehension | coqa | | 0.198797 | 0.109733 | 0-shot |
42
+ | reading_comprehension | boolq | | 0.8263 | 0.836391 | 0-shot |
43
+ | world_knowledge | triviaqa_sm_sub | | 0.590667 | 0.511333 | 3-shot |
44
+ | world_knowledge | jeopardy | Average | 0.4975 | 0.489451 | 3-shot |
45
+ | world_knowledge | | american_history | 0.535109 | 0.544794 | 3-shot |
46
+ | world_knowledge | | literature | 0.622449 | 0.626531 | 3-shot |
47
+ | world_knowledge | | science | 0.420168 | 0.390756 | 3-shot |
48
+ | world_knowledge | | word_origins | 0.293151 | 0.271233 | 3-shot |
49
+ | world_knowledge | | world_history | 0.616622 | 0.613941 | 3-shot |
50
+ | world_knowledge | bigbench_qa_wikidata | | 0.684366 | 0.644358 | 3-shot |
51
+ | world_knowledge | arc_easy | | 0.808923 | 0.808081 | 3-shot |
52
+ | world_knowledge | arc_challenge | | 0.571672 | 0.571672 | 3-shot |
53
+ | commonsense_reasoning | siqa | | 0.827533 | 0.814227 | 3-shot |
54
+ | language_understanding | winograd | | 0.871795 | 0.860806 | 3-shot |
55
+ | symbolic_problem_solving | bigbench_operators | | 0.547619 | 0.552381 | 3-shot |
56
+ | reading_comprehension | squad | | 0.581552 | 0.58789 | 3-shot |
57
+ | symbolic_problem_solving | svamp | | 0.68 | 0.57 | 5-shot |
58
+ | world_knowledge | mmlu | Average | 0.668279 | 0.645874 | 5-shot |
59
+ | world_knowledge | | abstract_algebra | 0.29 | 0.33 | 5-shot |
60
+ | world_knowledge | | anatomy | 0.681481 | 0.651852 | 5-shot |
61
+ | world_knowledge | | astronomy | 0.703947 | 0.671053 | 5-shot |
62
+ | world_knowledge | | business_ethics | 0.67 | 0.68 | 5-shot |
63
+ | world_knowledge | | clinical_knowledge | 0.750943 | 0.701887 | 5-shot |
64
+ | world_knowledge | | college_biology | 0.784722 | 0.729167 | 5-shot |
65
+ | world_knowledge | | college_chemistry | 0.47 | 0.46 | 5-shot |
66
+ | world_knowledge | | college_computer_science | 0.56 | 0.54 | 5-shot |
67
+ | world_knowledge | | college_mathematics | 0.36 | 0.28 | 5-shot |
68
+ | world_knowledge | | college_medicine | 0.653179 | 0.635838 | 5-shot |
69
+ | world_knowledge | | college_physics | 0.5 | 0.431373 | 5-shot |
70
+ | world_knowledge | | computer_security | 0.78 | 0.75 | 5-shot |
71
+ | world_knowledge | | conceptual_physics | 0.548936 | 0.557447 | 5-shot |
72
+ | world_knowledge | | econometrics | 0.45614 | 0.482456 | 5-shot |
73
+ | world_knowledge | | electrical_engineering | 0.668966 | 0.586207 | 5-shot |
74
+ | world_knowledge | | elementary_mathematics | 0.439153 | 0.417989 | 5-shot |
75
+ | world_knowledge | | formal_logic | 0.47619 | 0.412698 | 5-shot |
76
+ | world_knowledge | | global_facts | 0.37 | 0.41 | 5-shot |
77
+ | world_knowledge | | high_school_biology | 0.790323 | 0.754839 | 5-shot |
78
+ | world_knowledge | | high_school_chemistry | 0.581281 | 0.507389 | 5-shot |
79
+ | world_knowledge | | high_school_computer_science | 0.71 | 0.74 | 5-shot |
80
+ | world_knowledge | | high_school_european_history | 0.745455 | 0.775758 | 5-shot |
81
+ | world_knowledge | | high_school_geography | 0.823232 | 0.823232 | 5-shot |
82
+ | world_knowledge | | high_school_government_and_politics | 0.917098 | 0.875648 | 5-shot |
83
+ | world_knowledge | | high_school_macroeconomics | 0.635897 | 0.620513 | 5-shot |
84
+ | world_knowledge | | high_school_mathematics | 0.407407 | 0.392593 | 5-shot |
85
+ | world_knowledge | | high_school_microeconomics | 0.726891 | 0.714286 | 5-shot |
86
+ | world_knowledge | | high_school_physics | 0.423841 | 0.410596 | 5-shot |
87
+ | world_knowledge | | high_school_psychology | 0.842202 | 0.838532 | 5-shot |
88
+ | world_knowledge | | high_school_statistics | 0.592593 | 0.513889 | 5-shot |
89
+ | world_knowledge | | high_school_us_history | 0.852941 | 0.852941 | 5-shot |
90
+ | world_knowledge | | high_school_world_history | 0.843882 | 0.831224 | 5-shot |
91
+ | world_knowledge | | human_aging | 0.717489 | 0.713004 | 5-shot |
92
+ | world_knowledge | | human_sexuality | 0.763359 | 0.70229 | 5-shot |
93
+ | world_knowledge | | international_law | 0.793388 | 0.77686 | 5-shot |
94
+ | world_knowledge | | jurisprudence | 0.814815 | 0.768519 | 5-shot |
95
+ | world_knowledge | | logical_fallacies | 0.754601 | 0.773006 | 5-shot |
96
+ | world_knowledge | | machine_learning | 0.553571 | 0.508929 | 5-shot |
97
+ | world_knowledge | | management | 0.84466 | 0.834951 | 5-shot |
98
+ | world_knowledge | | marketing | 0.92735 | 0.888889 | 5-shot |
99
+ | world_knowledge | | medical_genetics | 0.81 | 0.78 | 5-shot |
100
+ | world_knowledge | | miscellaneous | 0.825032 | 0.799489 | 5-shot |
101
+ | world_knowledge | | moral_disputes | 0.739884 | 0.722543 | 5-shot |
102
+ | world_knowledge | | moral_scenarios | 0.437989 | 0.38324 | 5-shot |
103
+ | world_knowledge | | nutrition | 0.764706 | 0.735294 | 5-shot |
104
+ | world_knowledge | | philosophy | 0.733119 | 0.713826 | 5-shot |
105
+ | world_knowledge | | prehistory | 0.719136 | 0.719136 | 5-shot |
106
+ | world_knowledge | | professional_accounting | 0.475177 | 0.485816 | 5-shot |
107
+ | world_knowledge | | professional_law | 0.480443 | 0.449153 | 5-shot |
108
+ | world_knowledge | | professional_medicine | 0.709559 | 0.676471 | 5-shot |
109
+ | world_knowledge | | professional_psychology | 0.694444 | 0.676471 | 5-shot |
110
+ | world_knowledge | | public_relations | 0.7 | 0.6 | 5-shot |
111
+ | world_knowledge | | security_studies | 0.730612 | 0.718367 | 5-shot |
112
+ | world_knowledge | | sociology | 0.830846 | 0.845771 | 5-shot |
113
+ | world_knowledge | | us_foreign_policy | 0.86 | 0.85 | 5-shot |
114
+ | world_knowledge | | virology | 0.542169 | 0.518072 | 5-shot |
115
+ | world_knowledge | | world_religions | 0.812865 | 0.795322 | 5-shot |
116
+ | symbolic_problem_solving | bigbench_dyck_languages | | 0.086 | 0.045 | 5-shot |
117
+ | language_understanding | winogrande | | 0.764009 | 0.759274 | 5-shot |
118
+ | symbolic_problem_solving | agi_eval_lsat_ar | | 0.3 | 0.278261 | 5-shot |
119
+ | symbolic_problem_solving | simple_arithmetic_nospaces | | 0.466 | 0.458 | 5-shot |
120
+ | symbolic_problem_solving | simple_arithmetic_withspaces | | 0.502 | 0.496 | 5-shot |
121
+ | reading_comprehension | agi_eval_lsat_rc | | 0.731343 | 0.708955 | 5-shot |
122
+ | reading_comprehension | agi_eval_lsat_lr | | 0.554902 | 0.560784 | 5-shot |
123
+ | reading_comprehension | agi_eval_sat_en | | 0.81068 | 0.805825 | 5-shot |
124
+ | world_knowledge | arc_challenge | | 0.582765 | 0.591297 | 25-shot |
125
+ | commonsense_reasoning | openbook_qa | | 0.478 | 0.468 | 10-shot |
126
+ | language_understanding | hellaswag | | 0.769468 | 0.771062 | 10-shot |
127
+ | | bigbench_cs_algorithms | | 0.715151 | 0.687879 | 10-shot |
128
+ | symbolic_problem_solving | bigbench_elementary_math_qa | | 0.533569 | 0.530922 | 1-shot |
129
+