--- library_name: transformers tags: - unsloth - trl - grpo - reasoning - gsm8k datasets: - openai/gsm8k language: - en base_model: - Qwen/Qwen2.5-0.5B-Instruct pipeline_tag: question-answering license: apache-2.0 --- # Model Card for Qwen2.5-0.5B-Instruct-GSM8K-Reasoning <!-- Provide a quick summary of what the model is/does. --> This model is a fine-tuned version of the **Qwen2.5-0.5B-Instruct** model, specifically adapted for **mathematical reasoning tasks** using the **GSM8K dataset**. It leverages **GPRO (Generalized Policy Optimization for Reasoning)** methods, as described in the *DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models* paper, to enhance its reasoning capabilities. The fine-tuning process was performed using **Unsloth** for efficiency and **TRL (Transformer Reinforcement Learning)** for reinforcement learning-based training. ## Model Details ## How to Get Started with the Model Use the code below to load and use the model with vLLM & Unsloth: ```python from unsloth import FastLanguageModel from vllm import SamplingParams import torch # Load the Model & Tokenizer model, tokenizer = FastLanguageModel.from_pretrained( model_name = "AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K", max_seq_length = 2048, load_in_4bit = True, fast_inference = True, gpu_memory_utilization = 0.7, ) # Prep the Message PROMPT = "How many r's are in the word strawberry?" SYSTEM_PROMPT = """ A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. Respond in the following format: <reasoning> ... </reasoning> <answer> ... </answer> """ text = tokenizer.apply_chat_template([ {"role" : "system", "content" : SYSTEM_PROMPT}, {"role" : "user", "content" : PROMPT}, ], tokenize = False, add_generation_prompt = True) # Generate a response sampling_params = SamplingParams( temperature = 0.8, top_p = 0.95, max_tokens = 1024, ) output = model.fast_generate( text, sampling_params = sampling_params, )[0].outputs[0].text ``` ### Model Description - **Model type:** Transformer-based language model fine-tuned for mathematical reasoning. - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Finetuned from model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) ## Uses ### Direct Use This model is intended for **mathematical reasoning tasks**, particularly for solving grade-school-level math problems as found in the GSM8K dataset. It can be used directly for question-answering tasks involving arithmetic and reasoning. ### Downstream Use [optional] The model can be fine-tuned further for specific applications, such as tutoring systems, automated problem-solving tools, or other educational technologies. ### Out-of-Scope Use This model is not designed for: - High-level mathematical research or advanced problem-solving. - Non-mathematical reasoning tasks without additional fine-tuning. - Applications requiring high precision in domains outside its training data. ## Bias, Risks, and Limitations - **Bias:** The model may inherit biases present in the GSM8K dataset or the base model. - **Risks:** Incorrect reasoning or answers in critical applications (e.g., education or finance) could lead to misinformation. - **Limitations:** The model's performance is constrained by the quality and scope of the GSM8K dataset and the base model's capabilities. ### Recommendations Users should: - Validate the model's outputs for critical applications. - Fine-tune the model further for domain-specific tasks. - Be aware of potential biases and limitations in reasoning capabilities. ## Citations Cite GRPO as: ```bibtex @article{zhihong2024deepseekmath, title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}}, author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo}, year = 2024, eprint = {arXiv:2402.03300}, } ``` Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```