gemma3-4b-thinking

This model is a fine-tuned version of google/gemma-3-4b-it trained to enhance its reasoning and step-by-step thinking capabilities. It has been trained using TRL with GRPO (Generative Reinforcement Learning from Policy Optimization).

Model Description

This model was specifically tuned to demonstrate step-by-step reasoning when solving problems, particularly mathematical word problems. The training process used reinforcement learning to reward the model for:

  • Providing clear reasoning steps
  • Using logical deduction
  • Arriving at the correct numerical answer

Quick start

from transformers import pipeline, AutoProcessor

# Load the model and processor
processor = AutoProcessor.from_pretrained("real-jiakai/gemma3-4b-thinking")
generator = pipeline("text-generation", model="real-jiakai/gemma3-4b-thinking", tokenizer=processor.tokenizer)

# Example math problem
question = "The school principal decided that she wanted every class to have an equal number of boys and girls in each first-grade classroom. There are 4 classrooms. There are 56 boys and 44 girls. How many total students are in each classroom?"

# Format the input with chat template
input_text = processor.apply_chat_template([{"role": "user", "content": question}])

# Generate response with reasoning
output = generator(input_text, max_new_tokens=1024)
print(output[0]["generated_text"])

Model Performance

The model demonstrates enhanced reasoning capabilities compared to the base model, particularly for:

  • Mathematical word problems
  • Step-by-step logical deduction
  • Breaking complex problems into solvable components

Training Procedure

This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

Training Details

  • Dataset: GSM8k (Grade School Math 8k), a dataset of diverse grade school math word problems
  • Fine-tuning Method: GRPO (Generative Reinforcement Learning from Policy Optimization)
  • Training Steps: 100
  • Batch Size: 2
  • Learning Rate: 5e-6
  • Hardware: A100 80GB GPU
  • Parameter-Efficient Fine-Tuning: Used LoRA with r=16, alpha=32

Reward Functions

The training used multiple reward functions to guide the model:

  • Correctness of final answer
  • Using proper numerical formats
  • Demonstrating clear reasoning steps
  • Following structured formats

Framework versions

  • TRL: 0.16.0.dev0
  • Transformers: 4.50.0.dev0
  • Pytorch: 2.6.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.1

Limitations

  • The model sometimes reverts to its base output format rather than following the structured reasoning format used during training
  • Performance may vary across different types of problems
  • The model is primarily optimized for mathematical reasoning and may not show the same level of improvement on other tasks

Ethics and Responsible Use

  • This model is intended to demonstrate reasoning capabilities and should not be used as a sole solution for educational assessments
  • Users should verify mathematical results independently for critical applications
  • The model can still make reasoning errors despite showing its work

Citations

@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}

@article{shao2024deepseekmath,
  title={Deepseekmath: Pushing the limits of mathematical reasoning in open language models},
  author={Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Y and others},
  journal={arXiv preprint arXiv:2402.03300},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for real-jiakai/gemma3-4b-thinking

Finetuned
(19)
this model