Better than distilled version

#5
by urtuuuu - opened

Tested it on lots of questions, and it's my favorite model now. Not only R1-Distill-Qwen-14B performs worse, but apparently even 32B. If anyone can prove me wrong, give some examples.

There is a guy on youtube, who tested full Deepseek-R1 vs Openai O3 mini, and maybe some other models, on this question >>> "Write a snake game code in html", openai failed, only R1 succeded. I decided to test Qwen2.5-14B-Instruct-1M on this, and lol, it also wrote correct code (with a little bug though, but corrected it after i described the bug)

urtuuuu changed discussion title from Better that distilled version to Better than distilled version

I have had doubts as well. Not really seeing some radical improvement, but not seeing any major degradation either, in comparison. But the R1 versions are running a little faster on my hardware, so it may be a wash, for me.

i think both may have their place in the world. My guess at this stage is that if the model has to 'figure it out' R1 might be better with the extra overhead, but its its straight forward, then the 'base' will be better, with less noise. But that is a guess, not really had time to prove that suspicion.

EDIT and this is mostly on the 14b models

I played with the R1 distilled version and then found my way to this one and this is much smoother and smarter. For reasoning, I loaded it with a system prompt designed to approximate reasoning and it surprised us all how smart it was. Very smart model, especially given it's size.

@Windsage Care to share that prompt? id be curious to see it.

@Windsage Care to share that prompt? id be curious to see it.

https://gist.github.com/Maharshi-Pandya/4aeccbe1dbaa7f89c182bd65d2764203

Sign up or log in to comment