macadeliccc
commited on
Commit
•
1636c24
1
Parent(s):
f6358a5
Update README.md
Browse files
README.md
CHANGED
@@ -67,7 +67,86 @@ print(generate_response(prompt), "\n")
|
|
67 |
| | |none | 0|acc_norm|0.8058|± |0.0092|
|
68 |
|winogrande |Yaml |none | 0|acc |0.7372|± |0.0124|
|
69 |
|
70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
|
72 |
### 📚 Citations
|
73 |
|
|
|
67 |
| | |none | 0|acc_norm|0.8058|± |0.0092|
|
68 |
|winogrande |Yaml |none | 0|acc |0.7372|± |0.0124|
|
69 |
|
70 |
+
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
71 |
+
|---------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
72 |
+
|[SOLAR-math-2x10.7b](https://huggingface.co/macadeliccc/SOLAR-math-2x10.7b)| 47.2| 75.18| 64.73| 45.15| 58.07|
|
73 |
+
|
74 |
+
### AGIEval
|
75 |
+
| Task |Version| Metric |Value| |Stderr|
|
76 |
+
|------------------------------|------:|--------|----:|---|-----:|
|
77 |
+
|agieval_aqua_rat | 0|acc |30.31|± | 2.89|
|
78 |
+
| | |acc_norm|30.31|± | 2.89|
|
79 |
+
|agieval_logiqa_en | 0|acc |43.78|± | 1.95|
|
80 |
+
| | |acc_norm|43.93|± | 1.95|
|
81 |
+
|agieval_lsat_ar | 0|acc |21.74|± | 2.73|
|
82 |
+
| | |acc_norm|19.13|± | 2.60|
|
83 |
+
|agieval_lsat_lr | 0|acc |57.25|± | 2.19|
|
84 |
+
| | |acc_norm|56.47|± | 2.20|
|
85 |
+
|agieval_lsat_rc | 0|acc |68.77|± | 2.83|
|
86 |
+
| | |acc_norm|68.03|± | 2.85|
|
87 |
+
|agieval_sat_en | 0|acc |78.16|± | 2.89|
|
88 |
+
| | |acc_norm|79.13|± | 2.84|
|
89 |
+
|agieval_sat_en_without_passage| 0|acc |47.57|± | 3.49|
|
90 |
+
| | |acc_norm|44.66|± | 3.47|
|
91 |
+
|agieval_sat_math | 0|acc |41.36|± | 3.33|
|
92 |
+
| | |acc_norm|35.91|± | 3.24|
|
93 |
+
|
94 |
+
Average: 47.2%
|
95 |
+
|
96 |
+
### GPT4All
|
97 |
+
| Task |Version| Metric |Value| |Stderr|
|
98 |
+
|-------------|------:|--------|----:|---|-----:|
|
99 |
+
|arc_challenge| 0|acc |59.22|± | 1.44|
|
100 |
+
| | |acc_norm|61.43|± | 1.42|
|
101 |
+
|arc_easy | 0|acc |84.26|± | 0.75|
|
102 |
+
| | |acc_norm|83.63|± | 0.76|
|
103 |
+
|boolq | 1|acc |88.69|± | 0.55|
|
104 |
+
|hellaswag | 0|acc |65.98|± | 0.47|
|
105 |
+
| | |acc_norm|84.29|± | 0.36|
|
106 |
+
|openbookqa | 0|acc |34.20|± | 2.12|
|
107 |
+
| | |acc_norm|47.20|± | 2.23|
|
108 |
+
|piqa | 0|acc |81.83|± | 0.90|
|
109 |
+
| | |acc_norm|82.59|± | 0.88|
|
110 |
+
|winogrande | 0|acc |78.45|± | 1.16|
|
111 |
+
|
112 |
+
Average: 75.18%
|
113 |
+
|
114 |
+
### TruthfulQA
|
115 |
+
| Task |Version|Metric|Value| |Stderr|
|
116 |
+
|-------------|------:|------|----:|---|-----:|
|
117 |
+
|truthfulqa_mc| 1|mc1 |48.47|± | 1.75|
|
118 |
+
| | |mc2 |64.73|± | 1.53|
|
119 |
+
|
120 |
+
Average: 64.73%
|
121 |
+
|
122 |
+
### Bigbench
|
123 |
+
| Task |Version| Metric |Value| |Stderr|
|
124 |
+
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|
125 |
+
|bigbench_causal_judgement | 0|multiple_choice_grade|61.05|± | 3.55|
|
126 |
+
|bigbench_date_understanding | 0|multiple_choice_grade|68.56|± | 2.42|
|
127 |
+
|bigbench_disambiguation_qa | 0|multiple_choice_grade|35.27|± | 2.98|
|
128 |
+
|bigbench_geometric_shapes | 0|multiple_choice_grade|31.20|± | 2.45|
|
129 |
+
| | |exact_str_match | 0.00|± | 0.00|
|
130 |
+
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.00|± | 2.05|
|
131 |
+
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|23.43|± | 1.60|
|
132 |
+
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|46.00|± | 2.88|
|
133 |
+
|bigbench_movie_recommendation | 0|multiple_choice_grade|35.60|± | 2.14|
|
134 |
+
|bigbench_navigate | 0|multiple_choice_grade|57.50|± | 1.56|
|
135 |
+
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|55.80|± | 1.11|
|
136 |
+
|bigbench_ruin_names | 0|multiple_choice_grade|45.98|± | 2.36|
|
137 |
+
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|40.58|± | 1.56|
|
138 |
+
|bigbench_snarks | 0|multiple_choice_grade|66.85|± | 3.51|
|
139 |
+
|bigbench_sports_understanding | 0|multiple_choice_grade|71.40|± | 1.44|
|
140 |
+
|bigbench_temporal_sequences | 0|multiple_choice_grade|56.40|± | 1.57|
|
141 |
+
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|24.00|± | 1.21|
|
142 |
+
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|17.09|± | 0.90|
|
143 |
+
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|46.00|± | 2.88|
|
144 |
+
|
145 |
+
Average: 45.15%
|
146 |
+
|
147 |
+
Average score: 58.07%
|
148 |
+
|
149 |
+
Elapsed time: 04:05:27
|
150 |
|
151 |
### 📚 Citations
|
152 |
|