Update README.md
Browse files
README.md
CHANGED
@@ -34,10 +34,10 @@ The models are evaluated using open-ended and multiple-choice math problems from
|
|
34 |
| **Model** | **TheoremQA** | **MATH** | **GSM8K** | **GPQA** | **MMLU-ST** | **BBH** | **ARC-C** | **Avg** |
|
35 |
|:---------------------------------------|:--------------|:---------|:----------|:---------|:------------|:--------|:----------|:--------|
|
36 |
| **MAmmoTH2-7B** (Updated) | 29.0 | 36.7 | 68.4 | 32.4 | 62.4 | 58.6 | 81.7 | 52.7 |
|
37 |
-
| **MAmmoTH2-8B**
|
38 |
| **MAmmoTH2-8x7B** | 32.2 | 39.0 | 75.4 | 36.8 | 67.4 | 71.1 | 87.5 | 58.9 |
|
39 |
-
| **MAmmoTH2-7B-Plus**
|
40 |
-
| **MAmmoTH2-8B-Plus**
|
41 |
| **MAmmoTH2-8x7B-Plus** | 34.1 | 47.0 | 86.4 | 37.8 | 72.4 | 74.1 | 88.4 | 62.9 |
|
42 |
|
43 |
To reproduce our results, please refer to https://github.com/TIGER-AI-Lab/MAmmoTH2/tree/main/math_eval.
|
|
|
34 |
| **Model** | **TheoremQA** | **MATH** | **GSM8K** | **GPQA** | **MMLU-ST** | **BBH** | **ARC-C** | **Avg** |
|
35 |
|:---------------------------------------|:--------------|:---------|:----------|:---------|:------------|:--------|:----------|:--------|
|
36 |
| **MAmmoTH2-7B** (Updated) | 29.0 | 36.7 | 68.4 | 32.4 | 62.4 | 58.6 | 81.7 | 52.7 |
|
37 |
+
| **MAmmoTH2-8B** (Updated) | 30.3 | 35.8 | 70.4 | 35.2 | 64.2 | 62.1 | 82.2 | 54.3 |
|
38 |
| **MAmmoTH2-8x7B** | 32.2 | 39.0 | 75.4 | 36.8 | 67.4 | 71.1 | 87.5 | 58.9 |
|
39 |
+
| **MAmmoTH2-7B-Plus** (Updated) | 31.2 | 46.0 | 84.6 | 33.8 | 63.8 | 63.3 | 84.4 | 58.1 |
|
40 |
+
| **MAmmoTH2-8B-Plus** (Updated) | 31.5 | 43.0 | 85.2 | 35.8 | 66.7 | 69.7 | 84.3 | 59.4 |
|
41 |
| **MAmmoTH2-8x7B-Plus** | 34.1 | 47.0 | 86.4 | 37.8 | 72.4 | 74.1 | 88.4 | 62.9 |
|
42 |
|
43 |
To reproduce our results, please refer to https://github.com/TIGER-AI-Lab/MAmmoTH2/tree/main/math_eval.
|