Versi-StyleTune-31B (GGUF)

GGUF quants of Iloqt/Versi-StyleTune-head: a text-only Gemma 4 31B with Nimbz/Versipellis-31B as the base and the lm_head (output projection) grafted from Gryphe/Gemma-4-31B-StyleTune.

All credits go to Nimbz and Gryphe for the original models, I only committed the merge.

Variants

Standard quants:

  • Q2_K, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0 — body-only quantization.

hb16 variants (head + embeddings kept at BF16):

  • Q4_K_M-hb16, Q5_K_M-hb16, Q6_K-hb16
  • Preserves the grafted StyleTune lm_head and embed_tokens at full precision while quantizing the rest of the body to K-quant.

attn8-HB variants (Q8_0 attention + BF16 head + embeddings):

  • Q4_K_M-attn8-HB, Q5_K_M-attn8-HB, Q6_K-attn8-HB
  • Adds Q8_0 attention layers on top of the hb16 protection.

Notes

  • Run with the Gemma 4 chat template; thinking off by default.
  • Like all Gemma 4 models, benefits from repetition penalty or DRY to avoid token loops.
Downloads last month
528
GGUF
Model size
32B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Iloqt/Versi-StyleTune-31B-GGUF

Quantized
(3)
this model