macadeliccc commited on
Commit
1636c24
1 Parent(s): f6358a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -1
README.md CHANGED
@@ -67,7 +67,86 @@ print(generate_response(prompt), "\n")
67
  | | |none | 0|acc_norm|0.8058|± |0.0092|
68
  |winogrande |Yaml |none | 0|acc |0.7372|± |0.0124|
69
 
70
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  ### 📚 Citations
73
 
 
67
  | | |none | 0|acc_norm|0.8058|± |0.0092|
68
  |winogrande |Yaml |none | 0|acc |0.7372|± |0.0124|
69
 
70
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
71
+ |---------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
72
+ |[SOLAR-math-2x10.7b](https://huggingface.co/macadeliccc/SOLAR-math-2x10.7b)| 47.2| 75.18| 64.73| 45.15| 58.07|
73
+
74
+ ### AGIEval
75
+ | Task |Version| Metric |Value| |Stderr|
76
+ |------------------------------|------:|--------|----:|---|-----:|
77
+ |agieval_aqua_rat | 0|acc |30.31|± | 2.89|
78
+ | | |acc_norm|30.31|± | 2.89|
79
+ |agieval_logiqa_en | 0|acc |43.78|± | 1.95|
80
+ | | |acc_norm|43.93|± | 1.95|
81
+ |agieval_lsat_ar | 0|acc |21.74|± | 2.73|
82
+ | | |acc_norm|19.13|± | 2.60|
83
+ |agieval_lsat_lr | 0|acc |57.25|± | 2.19|
84
+ | | |acc_norm|56.47|± | 2.20|
85
+ |agieval_lsat_rc | 0|acc |68.77|± | 2.83|
86
+ | | |acc_norm|68.03|± | 2.85|
87
+ |agieval_sat_en | 0|acc |78.16|± | 2.89|
88
+ | | |acc_norm|79.13|± | 2.84|
89
+ |agieval_sat_en_without_passage| 0|acc |47.57|± | 3.49|
90
+ | | |acc_norm|44.66|± | 3.47|
91
+ |agieval_sat_math | 0|acc |41.36|± | 3.33|
92
+ | | |acc_norm|35.91|± | 3.24|
93
+
94
+ Average: 47.2%
95
+
96
+ ### GPT4All
97
+ | Task |Version| Metric |Value| |Stderr|
98
+ |-------------|------:|--------|----:|---|-----:|
99
+ |arc_challenge| 0|acc |59.22|± | 1.44|
100
+ | | |acc_norm|61.43|± | 1.42|
101
+ |arc_easy | 0|acc |84.26|± | 0.75|
102
+ | | |acc_norm|83.63|± | 0.76|
103
+ |boolq | 1|acc |88.69|± | 0.55|
104
+ |hellaswag | 0|acc |65.98|± | 0.47|
105
+ | | |acc_norm|84.29|± | 0.36|
106
+ |openbookqa | 0|acc |34.20|± | 2.12|
107
+ | | |acc_norm|47.20|± | 2.23|
108
+ |piqa | 0|acc |81.83|± | 0.90|
109
+ | | |acc_norm|82.59|± | 0.88|
110
+ |winogrande | 0|acc |78.45|± | 1.16|
111
+
112
+ Average: 75.18%
113
+
114
+ ### TruthfulQA
115
+ | Task |Version|Metric|Value| |Stderr|
116
+ |-------------|------:|------|----:|---|-----:|
117
+ |truthfulqa_mc| 1|mc1 |48.47|± | 1.75|
118
+ | | |mc2 |64.73|± | 1.53|
119
+
120
+ Average: 64.73%
121
+
122
+ ### Bigbench
123
+ | Task |Version| Metric |Value| |Stderr|
124
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
125
+ |bigbench_causal_judgement | 0|multiple_choice_grade|61.05|± | 3.55|
126
+ |bigbench_date_understanding | 0|multiple_choice_grade|68.56|± | 2.42|
127
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|35.27|± | 2.98|
128
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|31.20|± | 2.45|
129
+ | | |exact_str_match | 0.00|± | 0.00|
130
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.00|± | 2.05|
131
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|23.43|± | 1.60|
132
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|46.00|± | 2.88|
133
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|35.60|± | 2.14|
134
+ |bigbench_navigate | 0|multiple_choice_grade|57.50|± | 1.56|
135
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|55.80|± | 1.11|
136
+ |bigbench_ruin_names | 0|multiple_choice_grade|45.98|± | 2.36|
137
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|40.58|± | 1.56|
138
+ |bigbench_snarks | 0|multiple_choice_grade|66.85|± | 3.51|
139
+ |bigbench_sports_understanding | 0|multiple_choice_grade|71.40|± | 1.44|
140
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|56.40|± | 1.57|
141
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|24.00|± | 1.21|
142
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|17.09|± | 0.90|
143
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|46.00|± | 2.88|
144
+
145
+ Average: 45.15%
146
+
147
+ Average score: 58.07%
148
+
149
+ Elapsed time: 04:05:27
150
 
151
  ### 📚 Citations
152