Gemma-4-31B-StyleTune

A happy accident in surgical finetuning - 60% fewer clichés, an entirely new writing style, and the same Gemma 4 31B you already know underneath. One tensor changed out of 834.

What is a style tune?

Normally when I finetune a model I train as much of it as possible, loading every tensor and transforming it to better approximate whatever's in my data. Not this time. This time I trained precisely one tensor: the lm_head output projection - the layer that decides which token to emit. Literally the last stop before text appears on your screen.

This specific tensor has a massive influence on a model's writing style, something I first discovered building MythoMax years ago. Gemma 31B is a VRAM-hungry monster, so the question became: how do I have the maximum impact with the minimum hardware requirements?

The answer: freeze everything else. All 60 transformer layers, all the attention heads, all the MLPs — completely untouched. Only lm_head trains, which means VRAM requirements drop dramatically, training completes in a single overnight run on consumer hardware, and every single one of Gemma's capabilities remains fully intact. The model hasn't changed. Only the voice has, and it's done so in the best way possible. (Obligatory disclaimer: I might be biased towards my own data.)

I used the same data I had on me for my last Pantheon Reasoning release, with one notable exception - No instruct 24k set. 100% narrative data, certified cliché free.

What changed?

Benchmarked against 200 diverse roleplay prompts versus the base instruct model:

60% fewer clichés per 100 words (1.23 → 0.52)
Only 21.7% shared trigram vocabulary - the model reaches for an almost entirely different set of phrases, with responses feeling much less sloppy as a result.

Considering we're talking about narrative data it's hard to provide you with many other meaningful statistics - It's one of those "try it to understand it" kinda situations.

What didn't change?

Everything else. All the reasoning capability, world knowledge, instruction following, and language understanding are completely intact - none of those live in lm_head. This isn't a full finetune. It's a targeted style replacement on a single tensor.

Inference

Whatever you prefer, Gemma seems remarkably flexible in that regard. I run with temp 1.0, 0.10 MinP and the DRY sampler.

Prompt Format

Gemma 4's native chat template applies automatically.

Notes

For all I know this might only genuinely work for Gemma 4 31B specifically, but I'll certainly be poking other models if people enjoy this release. Feedback is, as always, very welcome!

Credits

Everyone from Anthracite! Hi, guys!
Latitude, for which I am still producing finetunes on a regular basis, helping me keep my skills sharp and up-to-date!
All the folks I chat with on a daily basis on Discord! You know who you are.
Anyone I forgot to mention, just in case!