TIGER-Lab
/

MAmmoTH2-8B-Plus

@@ -15,12 +15,13 @@ Code: [https://github.com/TIGER-AI-Lab/MAmmoTH2](https://github.com/TIGER-AI-Lab
 ## Introduction
 Introducing 🦣 MAmmoTH2, a game-changer in improving the reasoning abilities of large language models (LLMs) through innovative instruction tuning. By efficiently harvesting 10 million instruction-response pairs from the pre-training web corpus, we've developed MAmmoTH2 models that significantly boost performance on reasoning benchmarks. For instance, MAmmoTH2-7B (Mistral) sees its performance soar from 11% to 34% on MATH and from 36% to 67% on GSM8K, all without training on any domain-specific data. Further training on public instruction tuning datasets yields MAmmoTH2-Plus, setting new standards in reasoning and chatbot benchmarks. Our work presents a cost-effective approach to acquiring large-scale, high-quality instruction data, offering a fresh perspective on enhancing LLM reasoning abilities.
 |      | **Base Model** | **MAmmoTH2**                                                 | **MAmmoTH2-Plus**                                                  |
-|------|------------------|-------------------------------------------------------------------|------------------------------------------------------------------|
 | 7B   | Mistral              | 🦣 [MAmmoTH2-7B](https://huggingface.co/TIGER-Lab/MAmmoTH2-7B)      | 🦣 [MAmmoTH2-7B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-7B-Plus)     |
 | 8B   | Llama-3             | 🦣 [MAmmoTH2-8B](https://huggingface.co/TIGER-Lab/MAmmoTH2-8B)      | 🦣 [MAmmoTH2-8B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-8B-Plus)     |
 | 8x7B | Mixtral              | 🦣 [MAmmoTH2-8x7B](https://huggingface.co/TIGER-Lab/MAmmoTH2-8x7B)  | 🦣 [MAmmoTH2-8x7B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-8x7B-Plus) |
 ## Training Data
-(WEBINSTRUCT) Coming soon...
 ![Project Framework](webinstruct.png)
 ## Training Procedure
@@ -31,7 +32,7 @@ The models are evaluated using open-ended and multiple-choice math problems from
 | **Model**              | **TheoremQA** | **MATH** | **GSM8K** | **GPQA** | **MMLU-ST** | **BBH** | **ARC-C** | **Avg** |
-|------------------------|---------------|----------|-----------|----------|-------------|---------|-----------|---------|
 | **MAmmoTH2-7B**        | 26.7          | 34.2     | 67.4      | 34.8     | 60.6        | 60.0    | 81.8      | 52.2    |
 | **MAmmoTH2-8B**        | 29.7          | 33.4     | 67.9      | 38.4     | 61.0        | 60.8    | 81.0      | 53.1    |
 | **MAmmoTH2-8x7B**      | 32.2          | 39.0     | 75.4      | 36.8     | 67.4        | 71.1    | 87.5      | 58.9    |
@@ -55,8 +56,8 @@ If you use the models, data, or code from this project, please cite the original
 ```
 @article{yue2024mammoth2,
   title={MAmmoTH2: Scaling Instructions from the Web},
-  author={Xiang Yue, Tuney Zheng, Ge Zhang, Wenhu Chen},
-  journal={arXiv preprint arXiv:2405.03548v1},
   year={2024}
 }
 ```

 ## Introduction
 Introducing 🦣 MAmmoTH2, a game-changer in improving the reasoning abilities of large language models (LLMs) through innovative instruction tuning. By efficiently harvesting 10 million instruction-response pairs from the pre-training web corpus, we've developed MAmmoTH2 models that significantly boost performance on reasoning benchmarks. For instance, MAmmoTH2-7B (Mistral) sees its performance soar from 11% to 34% on MATH and from 36% to 67% on GSM8K, all without training on any domain-specific data. Further training on public instruction tuning datasets yields MAmmoTH2-Plus, setting new standards in reasoning and chatbot benchmarks. Our work presents a cost-effective approach to acquiring large-scale, high-quality instruction data, offering a fresh perspective on enhancing LLM reasoning abilities.
 |      | **Base Model** | **MAmmoTH2**                                                 | **MAmmoTH2-Plus**                                                  |
+|:-----|:---------------------|:-------------------------------------------------------------------|:------------------------------------------------------------------|
 | 7B   | Mistral              | 🦣 [MAmmoTH2-7B](https://huggingface.co/TIGER-Lab/MAmmoTH2-7B)      | 🦣 [MAmmoTH2-7B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-7B-Plus)     |
 | 8B   | Llama-3             | 🦣 [MAmmoTH2-8B](https://huggingface.co/TIGER-Lab/MAmmoTH2-8B)      | 🦣 [MAmmoTH2-8B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-8B-Plus)     |
 | 8x7B | Mixtral              | 🦣 [MAmmoTH2-8x7B](https://huggingface.co/TIGER-Lab/MAmmoTH2-8x7B)  | 🦣 [MAmmoTH2-8x7B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-8x7B-Plus) |
 ## Training Data
+Please refer to https://huggingface.co/datasets/TIGER-Lab/WebInstructSub for more details.
 ![Project Framework](webinstruct.png)
 ## Training Procedure
 | **Model**              | **TheoremQA** | **MATH** | **GSM8K** | **GPQA** | **MMLU-ST** | **BBH** | **ARC-C** | **Avg** |
+|:-----------------------|:--------------|:---------|:----------|:---------|:------------|:--------|:----------|:---------|
 | **MAmmoTH2-7B**        | 26.7          | 34.2     | 67.4      | 34.8     | 60.6        | 60.0    | 81.8      | 52.2    |
 | **MAmmoTH2-8B**        | 29.7          | 33.4     | 67.9      | 38.4     | 61.0        | 60.8    | 81.0      | 53.1    |
 | **MAmmoTH2-8x7B**      | 32.2          | 39.0     | 75.4      | 36.8     | 67.4        | 71.1    | 87.5      | 58.9    |
 ```
 @article{yue2024mammoth2,
   title={MAmmoTH2: Scaling Instructions from the Web},
+  author={Yue, Xiang and Zheng, Tuney and Zhang, Ge and Chen, Wenhu},
+  journal={arXiv preprint arXiv:2405.03548},
   year={2024}
 }
 ```