slimfrikha-tii
commited on
Commit
•
3097aa6
1
Parent(s):
d570153
docs(readme): update
Browse files
README.md
CHANGED
@@ -128,9 +128,9 @@ We report in the following table our internal pipeline benchmarks:
|
|
128 |
<tr>
|
129 |
<td rowspan="3">Math</td>
|
130 |
<td>GSM8K (5-shot)</td>
|
131 |
-
<td>
|
132 |
-
<td>
|
133 |
-
<td><b>
|
134 |
</tr>
|
135 |
<tr>
|
136 |
<td>GSM8k (8-shot, COT)</td>
|
@@ -145,7 +145,7 @@ We report in the following table our internal pipeline benchmarks:
|
|
145 |
<td><b>33.1</b></td>
|
146 |
</tr>
|
147 |
<tr>
|
148 |
-
<td rowspan="
|
149 |
<td>Arc Challenge (25-shot)</td>
|
150 |
<td>46.6</td>
|
151 |
<td>55.7</td>
|
@@ -175,12 +175,6 @@ We report in the following table our internal pipeline benchmarks:
|
|
175 |
<td><b>53.9</b></td>
|
176 |
<td>52.4</td>
|
177 |
</tr>
|
178 |
-
<tr>
|
179 |
-
<td>BBH (3-shot, COT)</td>
|
180 |
-
<td>6.7</td>
|
181 |
-
<td>21.2</td>
|
182 |
-
<td><b>69.3</b></td>
|
183 |
-
</tr>
|
184 |
<tr>
|
185 |
<td rowspan="4">CommonSense Understanding</td>
|
186 |
<td>PIQA (0-shot)</td>
|
|
|
128 |
<tr>
|
129 |
<td rowspan="3">Math</td>
|
130 |
<td>GSM8K (5-shot)</td>
|
131 |
+
<td>78.1</td>
|
132 |
+
<td>77.5</td>
|
133 |
+
<td><b>79.1</b></td>
|
134 |
</tr>
|
135 |
<tr>
|
136 |
<td>GSM8k (8-shot, COT)</td>
|
|
|
145 |
<td><b>33.1</b></td>
|
146 |
</tr>
|
147 |
<tr>
|
148 |
+
<td rowspan="5">Reasoning</td>
|
149 |
<td>Arc Challenge (25-shot)</td>
|
150 |
<td>46.6</td>
|
151 |
<td>55.7</td>
|
|
|
175 |
<td><b>53.9</b></td>
|
176 |
<td>52.4</td>
|
177 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
<tr>
|
179 |
<td rowspan="4">CommonSense Understanding</td>
|
180 |
<td>PIQA (0-shot)</td>
|