cowWhySo commited on
Commit
7f9abdc
1 Parent(s): e9ed0be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -94,6 +94,87 @@ resize_token_embeddings_to_32x: true
94
 
95
  GGUF: https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly-gguf
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  ## Training Summary
98
 
99
  ```json
 
94
 
95
  GGUF: https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly-gguf
96
 
97
+ ## Benchmarks
98
+
99
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
100
+ |--------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
101
+ |[Phi-3-mini-4k-instruct-Friendly](https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly)| 41| 67.56| 46.36| 39.3| 48.56|
102
+
103
+ ### AGIEval
104
+ | Task |Version| Metric |Value| |Stderr|
105
+ |------------------------------|------:|--------|----:|---|-----:|
106
+ |agieval_aqua_rat | 0|acc |22.05|± | 2.61|
107
+ | | |acc_norm|22.05|± | 2.61|
108
+ |agieval_logiqa_en | 0|acc |41.01|± | 1.93|
109
+ | | |acc_norm|41.32|± | 1.93|
110
+ |agieval_lsat_ar | 0|acc |22.17|± | 2.75|
111
+ | | |acc_norm|22.17|± | 2.75|
112
+ |agieval_lsat_lr | 0|acc |45.69|± | 2.21|
113
+ | | |acc_norm|45.88|± | 2.21|
114
+ |agieval_lsat_rc | 0|acc |59.48|± | 3.00|
115
+ | | |acc_norm|56.51|± | 3.03|
116
+ |agieval_sat_en | 0|acc |75.24|± | 3.01|
117
+ | | |acc_norm|70.39|± | 3.19|
118
+ |agieval_sat_en_without_passage| 0|acc |39.81|± | 3.42|
119
+ | | |acc_norm|37.86|± | 3.39|
120
+ |agieval_sat_math | 0|acc |33.64|± | 3.19|
121
+ | | |acc_norm|31.82|± | 3.15|
122
+
123
+ Average: 41.0%
124
+
125
+ ### GPT4All
126
+ | Task |Version| Metric |Value| |Stderr|
127
+ |-------------|------:|--------|----:|---|-----:|
128
+ |arc_challenge| 0|acc |49.74|± | 1.46|
129
+ | | |acc_norm|50.43|± | 1.46|
130
+ |arc_easy | 0|acc |76.68|± | 0.87|
131
+ | | |acc_norm|73.23|± | 0.91|
132
+ |boolq | 1|acc |79.27|± | 0.71|
133
+ |hellaswag | 0|acc |57.91|± | 0.49|
134
+ | | |acc_norm|77.13|± | 0.42|
135
+ |openbookqa | 0|acc |35.00|± | 2.14|
136
+ | | |acc_norm|43.80|± | 2.22|
137
+ |piqa | 0|acc |77.86|± | 0.97|
138
+ | | |acc_norm|79.54|± | 0.94|
139
+ |winogrande | 0|acc |69.53|± | 1.29|
140
+
141
+ Average: 67.56%
142
+
143
+ ### TruthfulQA
144
+ | Task |Version|Metric|Value| |Stderr|
145
+ |-------------|------:|------|----:|---|-----:|
146
+ |truthfulqa_mc| 1|mc1 |31.21|± | 1.62|
147
+ | | |mc2 |46.36|± | 1.55|
148
+
149
+ Average: 46.36%
150
+
151
+ ### Bigbench
152
+ | Task |Version| Metric |Value| |Stderr|
153
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
154
+ |bigbench_causal_judgement | 0|multiple_choice_grade|54.74|± | 3.62|
155
+ |bigbench_date_understanding | 0|multiple_choice_grade|66.67|± | 2.46|
156
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|29.46|± | 2.84|
157
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|11.98|± | 1.72|
158
+ | | |exact_str_match | 0.00|± | 0.00|
159
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|28.00|± | 2.01|
160
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|17.14|± | 1.43|
161
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|45.67|± | 2.88|
162
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|24.40|± | 1.92|
163
+ |bigbench_navigate | 0|multiple_choice_grade|53.70|± | 1.58|
164
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|68.10|± | 1.04|
165
+ |bigbench_ruin_names | 0|multiple_choice_grade|31.03|± | 2.19|
166
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|15.93|± | 1.16|
167
+ |bigbench_snarks | 0|multiple_choice_grade|77.35|± | 3.12|
168
+ |bigbench_sports_understanding | 0|multiple_choice_grade|52.64|± | 1.59|
169
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|51.50|± | 1.58|
170
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|19.52|± | 1.12|
171
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|13.89|± | 0.83|
172
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|45.67|± | 2.88|
173
+
174
+ Average: 39.3%
175
+
176
+ Average score: 48.56%
177
+
178
  ## Training Summary
179
 
180
  ```json