Qwen 2.5 3B Reasoner
This model is a fine-tuned version of the Qwen 2.5 3B model optimized for reasoning tasks.
Training Details
- Base model: Qwen 2.5 3B
- Training method: GRPO (Generalized Rejection on Policy Optimization)
- Training focus: Enhanced reasoning capabilities
- Training parameters: [Add any important hyperparameters]
Intended Use
This model is designed for tasks requiring strong reasoning abilities, including logical problems, multi-step reasoning chains, and complex decision-making scenarios.
Limitations
[Note any known limitations]
- Downloads last month
- 39
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.