AdamLucek
/

Qwen2.5-3B-Instruct-GRPO-2K-GSM8K-LoRA

text-generation-inference

Model card Files Files and versions Community

AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K-LoRA

LoRA-adapter only from AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K. See original model card for additional details.

This adapter is a GRPO fine-tuned version of unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit on a subset of 2,000 examples from openai/gsm8k using Unsloth.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support