Papers
arxiv:2404.02893

ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

Published on Apr 3
· Featured in Daily Papers on Apr 4
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving. While many strategies and datasets to enhance LLMs' mathematics are developed, it remains a challenge to simultaneously maintain and improve both language and mathematical capabilities in deployed LLM systems.In this work, we tailor the Self-Critique pipeline, which addresses the challenge in the feedback learning stage of LLM alignment. We first train a general Math-Critique model from the LLM itself to provide feedback signals. Then, we sequentially employ rejective fine-tuning and direct preference optimization over the LLM's own generations for data collection. Based on ChatGLM3-32B, we conduct a series of experiments on both academic and our newly created challenging dataset, MathUserEval. Results show that our pipeline significantly enhances the LLM's mathematical problem-solving while still improving its language ability, outperforming LLMs that could be two times larger. Related techniques have been deployed to ChatGLM\url{https://chatglm.cn}, an online serving LLM. Related evaluation dataset and scripts are released at https://github.com/THUDM/ChatGLM-Math.

Community

https://github.com/THUDM/ChatGLM-Math is not accessible :(

information (math) wants to be free -- let it! :)

·

https://github.com/THUDM/ChatGLM-Math)is now accessible. Thank you for your feedback.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2404.02893 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2404.02893 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2404.02893 in a Space README.md to link it from this page.

Collections including this paper 8