Instructions to use ewald1976/Silver-Siren-ST-12B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use ewald1976/Silver-Siren-ST-12B with NeMo:
# tag did not correspond to a valid NeMo domain.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use ewald1976/Silver-Siren-ST-12B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ewald1976/Silver-Siren-ST-12B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ewald1976/Silver-Siren-ST-12B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ewald1976/Silver-Siren-ST-12B to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="ewald1976/Silver-Siren-ST-12B", max_seq_length=2048, )
Silver-Siren-ST-12B (Mistral-Nemo-12B-based StyleTune)
A while ago, I stumbled upon Gryphe's report on "Style-Tuning" (Gemma-4-31B-StyleTune) by chance. At least for me, this concept was entirely new, but it sounded incredibly promising—so I just had to test it out myself.
This model is the result of that experiment: A technical test in surgical style-tuning to verify if isolating style changes to a single tensor (lm_head) while freezing the core weights works effectively on the Mistral-NeMo-12B architecture.
Methodology & Concept
Normally, a fine-tune alters as many layers as possible to align both the reasoning and formatting of a model to a new dataset. This "StyleTune" approach does the exact opposite:
- Freeze everything: All attention and MLP layers (Layers 0–39) remain completely untouched. The underlying logic, world knowledge, and instruction-following capabilities are preserved exactly as they were.
- Target the Language Center: Only one single tensor—the
lm_head(output projection)—is trained.
By retraining only the lm_head, the model doesn't become "smarter" or "dumber," but its vocabulary, sentence structure, and prose quality are completely recalibrated. It changes the voice of the model, not its brain.
The Target Base: Why this model?
To demonstrate the contrast and efficacy of this method, I deliberately chose Vortex5/Silver-Siren-12B as the target base.
Important Note: This choice is purely technical and meant with the utmost respect for the original author.
Silver-Siren-12Bis a highly popular, emotion-forward merge (incorporating models like Dark-Nexus, Elysian-Sunrise, LunaMaid, and others) that is highly optimized for sensational, dramatic, and immersive interactions. Because its native style profile is so distinct, it served as the perfect benchmark to test if a purelm_headtune could cleanly overwrite a deeply baked-in stylistic bias without degrading the underlying merge quality.
By exposing this base to a highly curated, classical sci-fi literary dataset (inspired by Asimov, Huxley, and Lem), the model underwent a dramatic transformation in its prose delivery.
Training Details & Parameters
- Epochs: 3
- Learning Rate: 4e-4 (Linear Scheduler)
- Target Modules:
lm_headonly (all other linear layers frozen)
Recommended Sampler Settings
- Temperature: 0.7 - 0.9
- Min_P: 0.05
- Top_P: 0.95
- Repetition Penalty: 1.05
Thank you
- to Gryphe for posting this excellent finding. https://huggingface.co/Gryphe/Gemma-4-31B-StyleTune
- to Vortex5 for creating this model.
- Downloads last month
- 671