TIGER-Lab
/

MAmmoTH2-8x7B

@@ -29,26 +29,18 @@ The models are fine-tuned with the WEBINSTRUCT dataset using the original Llama-
 The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
-| **Model**             	| **Decoding** 	| **GSM**  	| **MATH** 	| **AQuA** 	| **NumG** 	| **SVA**  	| **Mat**  	| **Sim**  	| **SAT**  	| **MMLU** 	| **AVG**  	|
-|-----------------------|--------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
-| **MAmmoTH2-7B**        	| CoT          	| 50.5     	| 10.4     	| 43.7     	| 44.0     	| 47.3     	| 9.2      	| 18.9     	| 32.7     	| 39.9     	| 33.0     	|
-|                       	| PoT          	| 51.6     	| 28.7     	| 43.3     	| 52.3     	| 65.1     	| 41.9     	| 48.2     	| 39.1     	| 44.6     	| 46.1     	|
-|                       	| **Hybrid**   	| **53.6** 	| **31.5** 	| **44.5** 	| **61.2** 	| **67.7** 	| **46.3** 	| **41.2** 	| **42.7** 	| **42.6** 	| **47.9** 	|
-| **MAmmoTH2-8B**        	| CoT          	| 22.4     	| 7.9      	| 36.2     	| 36.0     	| 37.0     	| 8.2      	| 7.2      	| 32.7     	| 34.6     	| 24.7     	|
-|                       	| PoT          	| 58.8     	| 32.1     	| 47.2     	| 57.1     	| 71.1     	| 53.9     	| 44.6     	| 40.0     	| 47.8     	| 50.3     	|
-|                       	| **Hybrid**   	| **59.4** 	| **33.4** 	| **47.2** 	| **66.4** 	| **71.4** 	| **55.4** 	| **45.9** 	| **40.5** 	| **48.3** 	| **52.0** 	|
-| **MAmmoTH2-8x7B**       	| CoT          	| 56.3     	| 12.9     	| 45.3     	| 45.6     	| 53.8     	| 11.7     	| 22.4     	| 43.6     	| 42.3     	| 37.1     	|
-|                       	| PoT          	| 61.3     	| 32.6     	| 48.8     	| 59.6     	| 72.2     	| 48.5     	| 40.3     	| 46.8     	| 45.4     	| 50.6     	|
-|                       	| **Hybrid**   	| **62.0** 	| **34.2** 	| **51.6** 	| **68.7** 	| **72.4** 	| **49.2** 	| **43.2** 	| **46.8** 	| **47.6** 	| **52.9** 	|
-| **MAmmoTH2-7B-Plus**      | CoT          	| 50.5     	| 10.4     	| 43.7     	| 44.0     	| 47.3     	| 9.2      	| 18.9     	| 32.7     	| 39.9     	| 33.0     	|
-|                       	| PoT          	| 51.6     	| 28.7     	| 43.3     	| 52.3     	| 65.1     	| 41.9     	| 48.2     	| 39.1     	| 44.6     	| 46.1     	|
-|                       	| **Hybrid**   	| **53.6** 	| **31.5** 	| **44.5** 	| **61.2** 	| **67.7** 	| **46.3** 	| **41.2** 	| **42.7** 	| **42.6** 	| **47.9** 	|
-| **MAmmoTH2-8B-Plus**     	| CoT          	| 22.4     	| 7.9      	| 36.2     	| 36.0     	| 37.0     	| 8.2      	| 7.2      	| 32.7     	| 34.6     	| 24.7     	|
-|                       	| PoT          	| 58.8     	| 32.1     	| 47.2     	| 57.1     	| 71.1     	| 53.9     	| 44.6     	| 40.0     	| 47.8     	| 50.3     	|
-|                       	| **Hybrid**   	| **59.4** 	| **33.4** 	| **47.2** 	| **66.4** 	| **71.4** 	| **55.4** 	| **45.9** 	| **40.5** 	| **48.3** 	| **52.0** 	|
-| **MAmmoTH2-8x7B-Plus**   	| CoT          	| 56.3     	| 12.9     	| 45.3     	| 45.6     	| 53.8     	| 11.7     	| 22.4     	| 43.6     	| 42.3     	| 37.1     	|
-|                       	| PoT          	| 61.3     	| 32.6     	| 48.8     	| 59.6     	| 72.2     	| 48.5     	| 40.3     	| 46.8     	| 45.4     	| 50.6     	|
-|                       	| **Hybrid**   	| **62.0** 	| **34.2** 	| **51.6** 	| **68.7** 	| **72.4** 	| **49.2** 	| **43.2** 	| **46.8** 	| **47.6** 	| **52.9** 	|

 The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
+Sure, here's the information presented in the format you provided:
+Certainly, here's the updated table with the model names in bold:
+| **Model**              | **Decoding** | **GSM** | **MATH** | **GPQA** | **MMLU-ST** | **BBH** | **ARC-C** | **Avg** |
+|------------------------|--------------|---------|----------|----------|-------------|---------|-----------|---------|
+| **MAmmoTH2-7B**        | 26.7         | 34.2    | 67.4     | 34.8     | 60.6        | 60.0    | 81.8      | 52.2    |
+| **MAmmoTH2-8B**        | 29.7         | 33.4    | 67.9     | 38.4     | 61.0        | 60.8    | 81.0      | 53.1    |
+| **MAmmoTH2-8x7B**      | 32.2         | 39.0    | 75.4     | 36.8     | 67.4        | 71.1    | 87.5      | 58.9    |
+| **MAmmoTH2-7B-Plus**   | 29.2         | 45.0    | 84.7     | 36.8     | 64.5        | 63.1    | 83.0      | 58.0    |
+| **MAmmoTH2-8B-Plus**   | 32.5         | 42.8    | 84.1     | 37.3     | 65.7        | 67.8    | 83.4      | 59.1    |
+| **MAmmoTH2-8x7B-Plus** | 34.1         | 47.0    | 86.4     | 37.8     | 72.4        | 74.1    | 88.4      | 62.9    |