My Reasoning Model

This is my first reasoning model. It is fairly small, and yes, it still gets the answer wrong to how many r's are in the word "strawberry."

You are welcome to use the model as you wish.

System Prompt Format

Respond in the following format:

<reasoning>
...
</reasoning>
<answer>
...
</answer>

I fine-tuned the model using openai/gsm8k, and to ensure costs do not go insane, I used a single A100.


Enjoy, but please note that this model is experimental and I used it to define my pipeline.

I will be testing fine tuning larger more capable models.  I suspect they would add more value in the short term.


---
base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
- gguf
license: apache-2.0
language:
- en
---

# Uploaded  model

- **Developed by:** dbands
- **License:** apache-2.0
- **Finetuned from model :** unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit

This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

dbands
/

Qwen2.5-3B-Instruct-reason-gguf

My Reasoning Model

System Prompt Format

Dataset used to train dbands/Qwen2.5-3B-Instruct-reason-gguf