---
library_name: transformers
tags:
- unsloth
- trl
- grpo
- reasoning
- gsm8k
datasets:
- openai/gsm8k
language:
- en
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
pipeline_tag: question-answering
license: apache-2.0
---

# Model Card for Qwen2.5-0.5B-Instruct-GSM8K-Reasoning

<!-- Provide a quick summary of what the model is/does. -->

This model is a fine-tuned version of the **Qwen2.5-0.5B-Instruct** model, specifically adapted for **mathematical reasoning tasks** using the **GSM8K dataset**. It leverages **GPRO (Generalized Policy Optimization for Reasoning)** methods, as described in the *DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models* paper, to enhance its reasoning capabilities. The fine-tuning process was performed using **Unsloth** for efficiency and **TRL (Transformer Reinforcement Learning)** for reinforcement learning-based training.

## Model Details

## How to Get Started with the Model

Use the code below to load and use the model with vLLM & Unsloth:

```python
from unsloth import FastLanguageModel
from vllm import SamplingParams
import torch

# Load the Model & Tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K",
    max_seq_length = 2048,
    load_in_4bit = True,
    fast_inference = True,
    gpu_memory_utilization = 0.7,
)

# Prep the Message
PROMPT = "How many r's are in the word strawberry?"

SYSTEM_PROMPT = """
A conversation between User and Assistant. The user asks a question,
and the Assistant solves it. The assistant first thinks about the
reasoning process in the mind and then provides the user with the answer.
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

text = tokenizer.apply_chat_template([
    {"role" : "system", "content" : SYSTEM_PROMPT},
    {"role" : "user", "content" : PROMPT},
], tokenize = False, add_generation_prompt = True)

# Generate a response
sampling_params = SamplingParams(
    temperature = 0.8,
    top_p = 0.95,
    max_tokens = 1024,
)
output = model.fast_generate(
    text,
    sampling_params = sampling_params,
)[0].outputs[0].text
```

### Model Description

- **Model type:** Transformer-based language model fine-tuned for mathematical reasoning.
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)

## Uses

### Direct Use

This model is intended for **mathematical reasoning tasks**, particularly for solving grade-school-level math problems as found in the GSM8K dataset. It can be used directly for question-answering tasks involving arithmetic and reasoning.

### Downstream Use [optional]

The model can be fine-tuned further for specific applications, such as tutoring systems, automated problem-solving tools, or other educational technologies.

### Out-of-Scope Use

This model is not designed for:
- High-level mathematical research or advanced problem-solving.
- Non-mathematical reasoning tasks without additional fine-tuning.
- Applications requiring high precision in domains outside its training data.

## Bias, Risks, and Limitations

- **Bias:** The model may inherit biases present in the GSM8K dataset or the base model.
- **Risks:** Incorrect reasoning or answers in critical applications (e.g., education or finance) could lead to misinformation.
- **Limitations:** The model's performance is constrained by the quality and scope of the GSM8K dataset and the base model's capabilities.

### Recommendations

Users should:
- Validate the model's outputs for critical applications.
- Fine-tune the model further for domain-specific tasks.
- Be aware of potential biases and limitations in reasoning capabilities.

## Citations

Cite GRPO as:

```bibtex
@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

```

Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```