A questionable choice for testing...

by pix2pix - opened 1 day ago

•

Hello, I have a question: why did you choose the 3.6 version of Qwen over the 3.5? Why MOE and not Dense? In my opinion, their MOE solutions aren't particularly interesting for RP, unlike the Gemma 4 26B. The 3.5 version has juicy prose, but it doesn't follow the instructions as well as the 3.6 version. Also, I'd like to hear your decision regarding the Gemma 4 12B.

Gryphe

Owner 1 day ago

I went with MoE because it was much quicker to test, and 3.6 is a known quantity to me. I'm not too familiar with the differences between 3.5 and 3.6 myself, and you're the first to point these out, so thank you for that at least!

As for Gemma 12B? I haven't made a decision there yet simply because I haven't had time to test it yet, simple as that.

This specific style tune's main purpose was to check what happens if I tackle non-Gemma architectures, first and foremost. The data I collect from this I'll be able to apply to future style tunes.

pix2pix

1 day ago

Thanks for your answers. I also tested your Gemma 26 tune. My opinion: it writes beautifully and vividly! But it loses consistency after about 16 thousand context tokens. The original, from my observations, broke after 32 thousand tokens. Plus, there's an interesting bug (the thinking mode breaks, but it's random and doesn't depend on the amount of context). The bug is not critical, but something to keep in mind. Otherwise, the pros definitely outweigh the cons!

PsiCat

about 20 hours ago

•

edited about 19 hours ago

I talked to neural networks about this issue. They also believe that the version 3.5 is better for roleplay. 3.6 holds the parts better and follows the instructions. At the same time, they are not technologically very different, because the difference in the release date is only 2 months. And it's harder to knock out robotization from version 3.6. 3.5 should generate a more literary text. However, both versions are multimodal. That's what I read.
"The Qwen 3.6 version does not change the hardware configuration, but is a software and algorithmic update (post-training, datasets and alignment)."
So it should be essentially the same network. Well, this looks interesting. But maybe this StyleTune will solve the main 3.6 rp problem. Will test it in a few days.

Answer for pix2pix. MoE is just fast for mid gamer PC. I think the 12-14B rp models are a bit silly, and I want to get not only a character, but also a smart home companion. In addition, larger networks can provide deeper behavior and justify it, rather than simply repeating what they saw as a distiller. But my computer can't run 20B+ dense models fast (12GB VRAM + 32GB RAM). It's a slow torture (3-5 t/s). But I can use 35B A3B without any problems (20-30 t/s).

Tibbnak

about 8 hours ago

PsiCat if your funny robots weren't so busy gaslighting you they might have also pointed out that
1- They're models are more heavily trained in Chinese (a pro-drop aspectual language) which has a different structure to English (a primarily morphological language) and gets in the way of things like tense and past/present/future participles and pronoun usage.
2- 3.6 qwen is almost exclusively just 3.5 but deep fried on openclaw and agentic tasks, which'd probably getchya yer extra clubheadedness der.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment