Making Gemma 3 think
Community Article
Published
March 13, 2025
Everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!
- has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running
git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft
plus this with --no-deps
git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly
- vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)
will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb
with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.
so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.
from trl import GRPOConfig
training_args = GRPOConfig(
learning_rate = 5e-6,
adam_beta1 = 0.9,
adam_beta2 = 0.99,
weight_decay = 0.1,
warmup_ratio = 0.1,
lr_scheduler_type = "cosine",
optim = "adamw_8bit",
logging_steps = 1,
per_device_train_batch_size = 2,
gradient_accumulation_steps = 1,
num_generations = 2,
max_prompt_length = 256,
max_completion_length = 1024 - 256,
num_train_epochs = 1,
max_steps = 250,
save_steps = 250,
max_grad_norm = 0.1,
report_to = "none",
)
if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.