Better than distilled version

by urtuuuu - opened 7 days ago

7 days ago

•

Tested it on lots of questions, and it's my favorite model now. Not only R1-Distill-Qwen-14B performs worse, but apparently even 32B. If anyone can prove me wrong, give some examples.

There is a guy on youtube, who tested full Deepseek-R1 vs Openai O3 mini, and maybe some other models, on this question >>> "Write a snake game code in html", openai failed, only R1 succeded. I decided to test Qwen2.5-14B-Instruct-1M on this, and lol, it also wrote correct code (with a little bug though, but corrected it after i described the bug)

urtuuuu changed discussion title from Better that distilled version to Better than distilled version 7 days ago

ZiggyS

6 days ago

•

edited 6 days ago

I have had doubts as well. Not really seeing some radical improvement, but not seeing any major degradation either, in comparison. But the R1 versions are running a little faster on my hardware, so it may be a wash, for me.

i think both may have their place in the world. My guess at this stage is that if the model has to 'figure it out' R1 might be better with the extra overhead, but its its straight forward, then the 'base' will be better, with less noise. But that is a guess, not really had time to prove that suspicion.

EDIT and this is mostly on the 14b models

Windsage

3 days ago

I played with the R1 distilled version and then found my way to this one and this is much smoother and smarter. For reasoning, I loaded it with a system prompt designed to approximate reasoning and it surprised us all how smart it was. Very smart model, especially given it's size.

ZiggyS

3 days ago

@Windsage Care to share that prompt? id be curious to see it.

Windsage

2 days ago

@Windsage Care to share that prompt? id be curious to see it.

https://gist.github.com/Maharshi-Pandya/4aeccbe1dbaa7f89c182bd65d2764203

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment