AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K-LoRA

LoRA-adapter only from AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K. See original model card for additional details.

This adapter is a GRPO fine-tuned version of unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit on a subset of 2,000 examples from openai/gsm8k using Unsloth.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support