AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K-LoRA
LoRA-adapter only from AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K. See original model card for additional details.
This adapter is a GRPO fine-tuned version of unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit on a subset of 2,000 examples from openai/gsm8k using Unsloth.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.