Status

Silver-Siren-ST-12B (Mistral-Nemo-12B-based StyleTune)

A while ago, I stumbled upon Gryphe's report on "Style-Tuning" (Gemma-4-31B-StyleTune) by chance. At least for me, this concept was entirely new, but it sounded incredibly promising—so I just had to test it out myself.

This model is the result of that experiment: A technical test in surgical style-tuning to verify if isolating style changes to a single tensor (lm_head) while freezing the core weights works effectively on the Mistral-NeMo-12B architecture.

Methodology & Concept

Normally, a fine-tune alters as many layers as possible to align both the reasoning and formatting of a model to a new dataset. This "StyleTune" approach does the exact opposite:

  1. Freeze everything: All attention and MLP layers (Layers 0–39) remain completely untouched. The underlying logic, world knowledge, and instruction-following capabilities are preserved exactly as they were.
  2. Target the Language Center: Only one single tensor—the lm_head (output projection)—is trained.

By retraining only the lm_head, the model doesn't become "smarter" or "dumber," but its vocabulary, sentence structure, and prose quality are completely recalibrated. It changes the voice of the model, not its brain.


The Target Base: Why this model?

To demonstrate the contrast and efficacy of this method, I deliberately chose Vortex5/Silver-Siren-12B as the target base.

Important Note: This choice is purely technical and meant with the utmost respect for the original author. Silver-Siren-12B is a highly popular, emotion-forward merge (incorporating models like Dark-Nexus, Elysian-Sunrise, LunaMaid, and others) that is highly optimized for sensational, dramatic, and immersive interactions. Because its native style profile is so distinct, it served as the perfect benchmark to test if a pure lm_head tune could cleanly overwrite a deeply baked-in stylistic bias without degrading the underlying merge quality.

By exposing this base to a highly curated, classical sci-fi literary dataset (inspired by Asimov, Huxley, and Lem), the model underwent a dramatic transformation in its prose delivery.


Training Details & Parameters

  • Epochs: 3
  • Learning Rate: 4e-4 (Linear Scheduler)
  • Target Modules: lm_head only (all other linear layers frozen)

Recommended Sampler Settings

  • Temperature: 0.7 - 0.9
  • Min_P: 0.05
  • Top_P: 0.95
  • Repetition Penalty: 1.05

Thank you

Downloads last month
671
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ewald1976/Silver-Siren-ST-12B

Finetuned
(1)
this model
Quantizations
3 models

Dataset used to train ewald1976/Silver-Siren-ST-12B

Collection including ewald1976/Silver-Siren-ST-12B