Qwen 2.5 3B Reasoner

This model is a fine-tuned version of the Qwen 2.5 3B model optimized for reasoning tasks.

Training Details

  • Base model: Qwen 2.5 3B
  • Training method: GRPO (Generalized Rejection on Policy Optimization)
  • Training focus: Enhanced reasoning capabilities
  • Training parameters: [Add any important hyperparameters]

Intended Use

This model is designed for tasks requiring strong reasoning abilities, including logical problems, multi-step reasoning chains, and complex decision-making scenarios.

Limitations

[Note any known limitations]

Downloads last month
39
GGUF
Model size
3.09B params
Architecture
qwen2

4-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.