Update README.md
Browse files
README.md
CHANGED
@@ -29,18 +29,14 @@ The models are fine-tuned with the WEBINSTRUCT dataset using the original Llama-
|
|
29 |
The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
|
30 |
|
31 |
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
| **
|
37 |
-
|
38 |
-
| **MAmmoTH2-
|
39 |
-
| **MAmmoTH2-
|
40 |
-
| **MAmmoTH2-8x7B** | 32.2 | 39.0 | 75.4 | 36.8 | 67.4 | 71.1 | 87.5 | 58.9 |
|
41 |
-
| **MAmmoTH2-7B-Plus** | 29.2 | 45.0 | 84.7 | 36.8 | 64.5 | 63.1 | 83.0 | 58.0 |
|
42 |
-
| **MAmmoTH2-8B-Plus** | 32.5 | 42.8 | 84.1 | 37.3 | 65.7 | 67.8 | 83.4 | 59.1 |
|
43 |
-
| **MAmmoTH2-8x7B-Plus** | 34.1 | 47.0 | 86.4 | 37.8 | 72.4 | 74.1 | 88.4 | 62.9 |
|
44 |
|
45 |
|
46 |
|
|
|
29 |
The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
|
30 |
|
31 |
|
32 |
+
| **Model** | **TheoremQA** | **MATH** | **GSM8K** | **GPQA** | **MMLU-ST** | **BBH** | **ARC-C** | **Avg** |
|
33 |
+
|------------------------|---------------|----------|-----------|----------|-------------|---------|-----------|---------|
|
34 |
+
| **MAmmoTH2-7B** | 26.7 | 34.2 | 67.4 | 34.8 | 60.6 | 60.0 | 81.8 | 52.2 |
|
35 |
+
| **MAmmoTH2-8B** | 29.7 | 33.4 | 67.9 | 38.4 | 61.0 | 60.8 | 81.0 | 53.1 |
|
36 |
+
| **MAmmoTH2-8x7B** | 32.2 | 39.0 | 75.4 | 36.8 | 67.4 | 71.1 | 87.5 | 58.9 |
|
37 |
+
| **MAmmoTH2-7B-Plus** | 29.2 | 45.0 | 84.7 | 36.8 | 64.5 | 63.1 | 83.0 | 58.0 |
|
38 |
+
| **MAmmoTH2-8B-Plus** | 32.5 | 42.8 | 84.1 | 37.3 | 65.7 | 67.8 | 83.4 | 59.1 |
|
39 |
+
| **MAmmoTH2-8x7B-Plus** | 34.1 | 47.0 | 86.4 | 37.8 | 72.4 | 74.1 | 88.4 | 62.9 |
|
|
|
|
|
|
|
|
|
40 |
|
41 |
|
42 |
|