A questionable choice for testing...

#1
by pix2pix - opened

Hello, I have a question: why did you choose the 3.6 version of Qwen over the 3.5? Why MOE and not Dense? In my opinion, their MOE solutions aren't particularly interesting for RP, unlike the Gemma 4 26B. The 3.5 version has juicy prose, but it doesn't follow the instructions as well as the 3.6 version. Also, I'd like to hear your decision regarding the Gemma 4 12B.

I went with MoE because it was much quicker to test, and 3.6 is a known quantity to me. I'm not too familiar with the differences between 3.5 and 3.6 myself, and you're the first to point these out, so thank you for that at least!

As for Gemma 12B? I haven't made a decision there yet simply because I haven't had time to test it yet, simple as that.

This specific style tune's main purpose was to check what happens if I tackle non-Gemma architectures, first and foremost. The data I collect from this I'll be able to apply to future style tunes.

Thanks for your answers. I also tested your Gemma 26 tune. My opinion: it writes beautifully and vividly! But it loses consistency after about 16 thousand context tokens. The original, from my observations, broke after 32 thousand tokens. Plus, there's an interesting bug (the thinking mode breaks, but it's random and doesn't depend on the amount of context). The bug is not critical, but something to keep in mind. Otherwise, the pros definitely outweigh the cons!

I talked to neural networks about this issue. They also believe that the version 3.5 is better for roleplay. 3.6 holds the parts better and follows the instructions. At the same time, they are not technologically very different, because the difference in the release date is only 2 months. And it's harder to knock out robotization from version 3.6. 3.5 should generate a more literary text. However, both versions are multimodal. That's what I read.
"The Qwen 3.6 version does not change the hardware configuration, but is a software and algorithmic update (post-training, datasets and alignment)."
So it should be essentially the same network. Well, this looks interesting. But maybe this StyleTune will solve the main 3.6 rp problem. Will test it in a few days.

Answer for pix2pix. MoE is just fast for mid gamer PC. I think the 12-14B rp models are a bit silly, and I want to get not only a character, but also a smart home companion. In addition, larger networks can provide deeper behavior and justify it, rather than simply repeating what they saw as a distiller. But my computer can't run 20B+ dense models fast (12GB VRAM + 32GB RAM). It's a slow torture (3-5 t/s). But I can use 35B A3B without any problems (20-30 t/s).

PsiCat if your funny robots weren't so busy gaslighting you they might have also pointed out that
1- They're models are more heavily trained in Chinese (a pro-drop aspectual language) which has a different structure to English (a primarily morphological language) and gets in the way of things like tense and past/present/future participles and pronoun usage.
2- 3.6 qwen is almost exclusively just 3.5 but deep fried on openclaw and agentic tasks, which'd probably getchya yer extra clubheadedness der.

Sign up or log in to comment