Qwemini-1.7B-Beta

Qwen3-1.7B fine-tuned on 250 Gemini 3 Pro chain-of-thought traces.

The grown-up version of Qwemini-0.5B-Alpha. Same teacher, same approach, a model that actually has the architecture to use it.

What it is

QLoRA fine-tune of Qwen3-1.7B on 250 Gemini 3 Pro structured reasoning traces. Goal was pure style transfer, Qwen3 already knows how to reason, this teaches it how we want it to reason. The native <think> token support changes everything compared to the 0.5B predecessor.

Training

Setting Value
Base model Qwen/Qwen3-1.7B
Method QLoRA (4-bit NF4, LoRA r=16)
Dataset 250 Gemini 3 Pro CoT traces
Hardware RTX 4060 8GB
Attention FlashAttention 2
Packing Enabled

Eval results

Prompt Result Notes
Bat & ball ($1.10 problem) ⚠️ Wrong answer, right process Got $0.10, but thinking block caught its own error and rationalized past it anyway
1/2 of 12 Fish drowning ⚠️ Near miss Noted "ambiguity in the question's phrasing" inside think block, answered 6 anyway — closest any model got to catching the false premise
Jug problem (3gal + 5gal = 4gal) ✅ Correct strategy Thinking block described the correct solution perfectly, written steps got slightly garbled
Pills trick (3 pills, every 30 min) ⚠️ Contradicted itself Produced two different answers (60 min and 90 min) in the same response without resolving the conflict

The big finding

Thinking tags activated unprompted.

Qwen3's native thinking architecture survived the fine-tune intact. The model genuinely uses an internal scratchpad before answering rather than just formatting its output to look like reasoning. This is qualitatively different from every other model in the Qwemini/YapLlama/AtomCoT family — those learned the costume of reasoning. This one is actually thinking, just not always correctly.

Honest assessment

The failure modes are completely different from the smaller model, instead of confident wrong answers or structured nonsense, you get a model that notices problems, almost catches false premises, and occasionally argues with itself.

The bat and ball error is the most interesting result: the thinking block explicitly computed $1.20 ≠ $1.10 and then declared the solution valid anyway. It's not that it can't detect errors — it's that it doesn't always act on them. More data and more epochs would likely close this gap.

Compared to Qwemini-0.5B-Alpha

0.5B-Alpha 1.7B-Beta
Native thinking tags
Bat & ball ✅ Correct ⚠️ Wrong but self-aware
Premise checking ⚠️ Almost
Jug problem ❌ Hallucinated ✅ Correct strategy
Reasoning quality Structured correct Genuinely thinking

What would improve it

  • More epochs, loss was still healthy at checkpoint, room to keep learning
  • Premise-checking traces, it almost caught the fish problem, 50 targeted examples would probably close it
  • More data and more varietey (eg 6000 rows, Gemini 3.1 + Opus 4.6) is the natural next step

Part of

The Qwemini model family, Qwen models fine-tuned for structured reasoning.

Model Params Thinking tags Actually reasons
Qwemini-0.5B-Alpha 500M ✅ simple problems
Qwemini-1.7B-Beta 1.7B ✅ with self-correction attempts
Downloads last month
3
GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

5-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NotHereNorThere/Qwemini-1.7b-Beta

Finetuned
Qwen/Qwen3-1.7B
Quantized
(281)
this model

Dataset used to train NotHereNorThere/Qwemini-1.7b-Beta

Collection including NotHereNorThere/Qwemini-1.7b-Beta