xianchaowu commited on
Commit
41592be
1 Parent(s): 78d14d8

checkpoint-800 for llama2-13b-chat

Browse files
Files changed (2) hide show
  1. README.md +127 -55
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -8,7 +8,15 @@ license: llama2
8
 
9
  0. using the updated [Meta's LLaMA-2 models](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf).
10
  1. support [4-bit qlora](https://arxiv.org/abs/2305.14314), extreme GPU memory and inference time saving;
11
- 2. comparable MMLU evaluation dataset results, llama2-13b's 54.8% to our 54.34% (-0.46%).
 
 
 
 
 
 
 
 
12
  3. This lazy-lora adapter is based on [Meta's LLaMA-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf), and using the [oasst1 dataset](https://huggingface.co/datasets/OpenAssistant/oasst1), following [Guanaco](https://huggingface.co/timdettmers/guanaco-65b).
13
 
14
  ### Introduction
@@ -84,67 +92,131 @@ model.print_trainable_parameters()
84
 
85
  ## MMLU result:
86
 
 
 
87
  ```json
88
- {"mmlu_loss": 1.818716175022038,
89
- "mmlu_eval_accuracy_public_relations": 0.6666666666666666,
90
- "mmlu_eval_accuracy_college_medicine": 0.36363636363636365,
 
 
 
 
 
 
91
  "mmlu_eval_accuracy_prehistory": 0.6571428571428571,
92
- "mmlu_eval_accuracy_astronomy": 0.5625,
93
- "mmlu_eval_accuracy_moral_scenarios": 0.27,
94
- "mmlu_eval_accuracy_high_school_statistics": 0.34782608695652173,
95
- "mmlu_eval_accuracy_security_studies": 0.6296296296296297,
96
- "mmlu_eval_accuracy_anatomy": 0.6428571428571429,
97
- "mmlu_eval_accuracy_sociology": 0.7272727272727273,
98
- "mmlu_eval_accuracy_professional_law": 0.3352941176470588,
99
- "mmlu_eval_accuracy_high_school_macroeconomics": 0.5348837209302325,
100
- "mmlu_eval_accuracy_college_chemistry": 0.375,
101
- "mmlu_eval_accuracy_us_foreign_policy": 0.7272727272727273,
102
- "mmlu_eval_accuracy_clinical_knowledge": 0.5172413793103449,
103
- "mmlu_eval_accuracy_college_physics": 0.5454545454545454,
104
- "mmlu_eval_accuracy_high_school_chemistry": 0.22727272727272727,
105
- "mmlu_eval_accuracy_electrical_engineering": 0.375,
106
- "mmlu_eval_accuracy_nutrition": 0.6666666666666666,
107
  "mmlu_eval_accuracy_professional_accounting": 0.41935483870967744,
108
- "mmlu_eval_accuracy_high_school_government_and_politics": 0.5714285714285714,
109
- "mmlu_eval_accuracy_professional_medicine": 0.4838709677419355,
110
- "mmlu_eval_accuracy_high_school_physics": 0.29411764705882354,
111
- "mmlu_eval_accuracy_miscellaneous": 0.686046511627907,
112
- "mmlu_eval_accuracy_virology": 0.4444444444444444,
113
- "mmlu_eval_accuracy_college_computer_science": 0.45454545454545453,
114
- "mmlu_eval_accuracy_international_law": 0.9230769230769231,
115
- "mmlu_eval_accuracy_logical_fallacies": 0.8333333333333334,
116
- "mmlu_eval_accuracy_high_school_biology": 0.5,
117
- "mmlu_eval_accuracy_abstract_algebra": 0.45454545454545453,
118
- "mmlu_eval_accuracy_high_school_european_history": 0.6666666666666666,
119
- "mmlu_eval_accuracy_high_school_microeconomics": 0.5384615384615384,
120
- "mmlu_eval_accuracy_medical_genetics": 0.7272727272727273,
121
- "mmlu_eval_accuracy_formal_logic": 0.14285714285714285,
122
- "mmlu_eval_accuracy_marketing": 0.76,
123
- "mmlu_eval_accuracy_human_sexuality": 0.5,
124
- "mmlu_eval_accuracy_econometrics": 0.4166666666666667,
125
- "mmlu_eval_accuracy_college_mathematics": 0.45454545454545453,
126
- "mmlu_eval_accuracy_high_school_mathematics": 0.2413793103448276,
127
- "mmlu_eval_accuracy_moral_disputes": 0.5526315789473685,
128
- "mmlu_eval_accuracy_high_school_geography": 0.7727272727272727,
129
  "mmlu_eval_accuracy_management": 0.8181818181818182,
 
 
 
 
 
 
130
  "mmlu_eval_accuracy_high_school_computer_science": 0.6666666666666666,
131
- "mmlu_eval_accuracy_machine_learning": 0.2727272727272727,
132
- "mmlu_eval_accuracy_high_school_us_history": 0.8636363636363636,
133
- "mmlu_eval_accuracy_business_ethics": 0.45454545454545453,
 
 
134
  "mmlu_eval_accuracy_conceptual_physics": 0.4230769230769231,
135
- "mmlu_eval_accuracy_global_facts": 0.5,
136
- "mmlu_eval_accuracy_college_biology": 0.625,
137
- "mmlu_eval_accuracy_elementary_mathematics": 0.2682926829268293,
138
- "mmlu_eval_accuracy_high_school_world_history": 0.5769230769230769,
139
  "mmlu_eval_accuracy_human_aging": 0.6086956521739131,
140
- "mmlu_eval_accuracy_jurisprudence": 0.6363636363636364,
141
- "mmlu_eval_accuracy_philosophy": 0.5,
142
- "mmlu_eval_accuracy_professional_psychology": 0.5217391304347826,
143
- "mmlu_eval_accuracy_world_religions": 0.7894736842105263,
144
- "mmlu_eval_accuracy_computer_security": 0.6363636363636364,
145
- "mmlu_eval_accuracy_high_school_psychology": 0.8,
146
- "mmlu_eval_accuracy": 0.5433557168763036,
147
- "epoch": 2.37}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
148
  ```
149
 
150
  ## License and intended use
 
8
 
9
  0. using the updated [Meta's LLaMA-2 models](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf).
10
  1. support [4-bit qlora](https://arxiv.org/abs/2305.14314), extreme GPU memory and inference time saving;
11
+ 2. comparable MMLU evaluation dataset results, llama2-13b-chat:
12
+
13
+ | | eval | test | comp-eval | comp-test |
14
+ |---------------|--------|--------|-----------|-----------|
15
+ |llama2-13b-chat| 54.58% | - | | |
16
+ |ckpt-800 | 53.86% | 53.32% | -0.72% | - |
17
+
18
+ llama2-13b-chat: "7389082e6bc4fcbf6202e6108a70194800e6c51e"
19
+
20
  3. This lazy-lora adapter is based on [Meta's LLaMA-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf), and using the [oasst1 dataset](https://huggingface.co/datasets/OpenAssistant/oasst1), following [Guanaco](https://huggingface.co/timdettmers/guanaco-65b).
21
 
22
  ### Introduction
 
92
 
93
  ## MMLU result:
94
 
95
+ ### MMLU eval result:
96
+
97
  ```json
98
+ {"mmlu_loss": 1.6594436656097273,
99
+ "mmlu_eval_accuracy_high_school_mathematics": 0.2413793103448276,
100
+ "mmlu_eval_accuracy_high_school_biology": 0.5,
101
+ "mmlu_eval_accuracy_business_ethics": 0.45454545454545453,
102
+ "mmlu_eval_accuracy_jurisprudence": 0.6363636363636364,
103
+ "mmlu_eval_accuracy_virology": 0.4444444444444444,
104
+ "mmlu_eval_accuracy_logical_fallacies": 0.6666666666666666,
105
+ "mmlu_eval_accuracy_professional_law": 0.3176470588235294,
106
+ "mmlu_eval_accuracy_econometrics": 0.3333333333333333,
107
  "mmlu_eval_accuracy_prehistory": 0.6571428571428571,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  "mmlu_eval_accuracy_professional_accounting": 0.41935483870967744,
109
+ "mmlu_eval_accuracy_professional_psychology": 0.4782608695652174,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  "mmlu_eval_accuracy_management": 0.8181818181818182,
111
+ "mmlu_eval_accuracy_human_sexuality": 0.5,
112
+ "mmlu_eval_accuracy_college_mathematics": 0.5454545454545454,
113
+ "mmlu_eval_accuracy_us_foreign_policy": 0.8181818181818182,
114
+ "mmlu_eval_accuracy_high_school_european_history": 0.6666666666666666,
115
+ "mmlu_eval_accuracy_miscellaneous": 0.7441860465116279,
116
+ "mmlu_eval_accuracy_international_law": 0.8461538461538461,
117
  "mmlu_eval_accuracy_high_school_computer_science": 0.6666666666666666,
118
+ "mmlu_eval_accuracy_world_religions": 0.7894736842105263,
119
+ "mmlu_eval_accuracy_high_school_physics": 0.29411764705882354,
120
+ "mmlu_eval_accuracy_moral_scenarios": 0.28,
121
+ "mmlu_eval_accuracy_sociology": 0.7727272727272727,
122
+ "mmlu_eval_accuracy_professional_medicine": 0.45161290322580644,
123
  "mmlu_eval_accuracy_conceptual_physics": 0.4230769230769231,
124
+ "mmlu_eval_accuracy_high_school_us_history": 0.8636363636363636,
125
+ "mmlu_eval_accuracy_clinical_knowledge": 0.5172413793103449,
 
 
126
  "mmlu_eval_accuracy_human_aging": 0.6086956521739131,
127
+ "mmlu_eval_accuracy_college_medicine": 0.4090909090909091,
128
+ "mmlu_eval_accuracy_computer_security": 0.7272727272727273,
129
+ "mmlu_eval_accuracy_moral_disputes": 0.5789473684210527,
130
+ "mmlu_eval_accuracy_security_studies": 0.6296296296296297,
131
+ "mmlu_eval_accuracy_high_school_world_history": 0.5769230769230769,
132
+ "mmlu_eval_accuracy_public_relations": 0.5833333333333334,
133
+ "mmlu_eval_accuracy_medical_genetics": 0.7272727272727273,
134
+ "mmlu_eval_accuracy_electrical_engineering": 0.375,
135
+ "mmlu_eval_accuracy_marketing": 0.8,
136
+ "mmlu_eval_accuracy_high_school_geography": 0.7272727272727273,
137
+ "mmlu_eval_accuracy_high_school_government_and_politics": 0.5714285714285714,
138
+ "mmlu_eval_accuracy_abstract_algebra": 0.2727272727272727,
139
+ "mmlu_eval_accuracy_nutrition": 0.6363636363636364,
140
+ "mmlu_eval_accuracy_college_biology": 0.625,
141
+ "mmlu_eval_accuracy_formal_logic": 0.14285714285714285,
142
+ "mmlu_eval_accuracy_machine_learning": 0.5454545454545454,
143
+ "mmlu_eval_accuracy_high_school_psychology": 0.7333333333333333,
144
+ "mmlu_eval_accuracy_high_school_statistics": 0.34782608695652173,
145
+ "mmlu_eval_accuracy_philosophy": 0.5588235294117647,
146
+ "mmlu_eval_accuracy_high_school_microeconomics": 0.5769230769230769,
147
+ "mmlu_eval_accuracy_global_facts": 0.5,
148
+ "mmlu_eval_accuracy_anatomy": 0.6428571428571429,
149
+ "mmlu_eval_accuracy_college_computer_science": 0.36363636363636365,
150
+ "mmlu_eval_accuracy_college_physics": 0.5454545454545454,
151
+ "mmlu_eval_accuracy_high_school_chemistry": 0.2727272727272727,
152
+ "mmlu_eval_accuracy_astronomy": 0.5625,
153
+ "mmlu_eval_accuracy_elementary_mathematics": 0.21951219512195122,
154
+ "mmlu_eval_accuracy_high_school_macroeconomics": 0.4418604651162791,
155
+ "mmlu_eval_accuracy_college_chemistry": 0.25,
156
+ "mmlu_eval_accuracy": 0.5385831470660036}
157
+ ```
158
+
159
+ ### MMLU test result:
160
+ ```json
161
+ {"mmlu_loss": 1.6477740873911495,
162
+ "mmlu_test_accuracy_us_foreign_policy": 0.76,
163
+ "mmlu_test_accuracy_conceptual_physics": 0.3659574468085106,
164
+ "mmlu_test_accuracy_professional_accounting": 0.38652482269503546,
165
+ "mmlu_test_accuracy_high_school_world_history": 0.7088607594936709,
166
+ "mmlu_test_accuracy_human_aging": 0.6547085201793722,
167
+ "mmlu_test_accuracy_clinical_knowledge": 0.569811320754717,
168
+ "mmlu_test_accuracy_abstract_algebra": 0.36,
169
+ "mmlu_test_accuracy_machine_learning": 0.3392857142857143,
170
+ "mmlu_test_accuracy_high_school_geography": 0.6767676767676768,
171
+ "mmlu_test_accuracy_medical_genetics": 0.54,
172
+ "mmlu_test_accuracy_virology": 0.4939759036144578,
173
+ "mmlu_test_accuracy_professional_medicine": 0.4889705882352941,
174
+ "mmlu_test_accuracy_philosophy": 0.594855305466238,
175
+ "mmlu_test_accuracy_logical_fallacies": 0.656441717791411,
176
+ "mmlu_test_accuracy_formal_logic": 0.2857142857142857,
177
+ "mmlu_test_accuracy_electrical_engineering": 0.5103448275862069,
178
+ "mmlu_test_accuracy_anatomy": 0.4962962962962963,
179
+ "mmlu_test_accuracy_computer_security": 0.68,
180
+ "mmlu_test_accuracy_high_school_physics": 0.3509933774834437,
181
+ "mmlu_test_accuracy_high_school_statistics": 0.37962962962962965,
182
+ "mmlu_test_accuracy_high_school_us_history": 0.7009803921568627,
183
+ "mmlu_test_accuracy_college_biology": 0.5347222222222222,
184
+ "mmlu_test_accuracy_college_mathematics": 0.32,
185
+ "mmlu_test_accuracy_marketing": 0.7606837606837606,
186
+ "mmlu_test_accuracy_moral_scenarios": 0.2849162011173184,
187
+ "mmlu_test_accuracy_high_school_mathematics": 0.3148148148148148,
188
+ "mmlu_test_accuracy_high_school_microeconomics": 0.5168067226890757,
189
+ "mmlu_test_accuracy_college_computer_science": 0.48,
190
+ "mmlu_test_accuracy_college_chemistry": 0.35,
191
+ "mmlu_test_accuracy_global_facts": 0.31,
192
+ "mmlu_test_accuracy_management": 0.6990291262135923,
193
+ "mmlu_test_accuracy_security_studies": 0.6204081632653061,
194
+ "mmlu_test_accuracy_high_school_psychology": 0.7211009174311926,
195
+ "mmlu_test_accuracy_international_law": 0.7272727272727273,
196
+ "mmlu_test_accuracy_college_medicine": 0.44508670520231214,
197
+ "mmlu_test_accuracy_professional_psychology": 0.5098039215686274,
198
+ "mmlu_test_accuracy_high_school_european_history": 0.6545454545454545,
199
+ "mmlu_test_accuracy_prehistory": 0.5925925925925926,
200
+ "mmlu_test_accuracy_business_ethics": 0.51,
201
+ "mmlu_test_accuracy_high_school_chemistry": 0.45320197044334976,
202
+ "mmlu_test_accuracy_high_school_government_and_politics": 0.7461139896373057,
203
+ "mmlu_test_accuracy_astronomy": 0.5723684210526315,
204
+ "mmlu_test_accuracy_human_sexuality": 0.5877862595419847,
205
+ "mmlu_test_accuracy_miscellaneous": 0.735632183908046,
206
+ "mmlu_test_accuracy_public_relations": 0.6181818181818182,
207
+ "mmlu_test_accuracy_elementary_mathematics": 0.35185185185185186,
208
+ "mmlu_test_accuracy_world_religions": 0.7602339181286549,
209
+ "mmlu_test_accuracy_moral_disputes": 0.5838150289017341,
210
+ "mmlu_test_accuracy_econometrics": 0.2894736842105263,
211
+ "mmlu_test_accuracy_high_school_computer_science": 0.58,
212
+ "mmlu_test_accuracy_jurisprudence": 0.6296296296296297,
213
+ "mmlu_test_accuracy_nutrition": 0.5980392156862745,
214
+ "mmlu_test_accuracy_high_school_macroeconomics": 0.4897435897435897,
215
+ "mmlu_test_accuracy_professional_law": 0.36962190352020863,
216
+ "mmlu_test_accuracy_high_school_biology": 0.635483870967742,
217
+ "mmlu_test_accuracy_college_physics": 0.3235294117647059,
218
+ "mmlu_test_accuracy_sociology": 0.7164179104477612,
219
+ "mmlu_test_accuracy": 0.5332109924946602}
220
  ```
221
 
222
  ## License and intended use
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:523e17537dd552110194c3dbcb3d267265234c4ba50764c4aed1e4bd521ba685
3
  size 500857293
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11500d8165c5ac7429525ca57ac156125403c9681a96924ab21f249262b61e6f
3
  size 500857293