DavidAU
/

Gemma-The-Writer-9B-GGUF

Model card Files Files and versions Community

Great model!

by Altotas - opened Oct 25, 2024

Oct 25, 2024

I used this model for a week or so, and it worked super well as a writing assistant, helping me with metaphors and adding flourishes to text while keeping it in line with my own style. I tried Gemma-The-Writer-J.GutenBerg-10B too, but that one felt like a downgrade so I'll stay with good old Gemma Writer 9B for now.

DavidAU

Owner Oct 26, 2024

@Altotas

Excellent; thank you for the feedback.

J.Gutenberg is a little over the top, and with brainstorm may affect some generation / instructions.
That being said you may want to try the new uncensored version of "Gemma The Writer - Restless Quill" (released yesterday) .

https://huggingface.co/DavidAU/Gemma-The-Writer-N-Restless-Quill-10B-Uncensored-GGUF

However, also at this repo prose control (and examples) are shown which may help with "Gemma The Writer 9B" you are already using.

InterDimensionalCat

Dec 5, 2024

So far, one of my favorite models. It has a pretty good understanding of what I want it to generate. When it comes to vivid descriptions, it outperforms its more serious competitors with 20B or more, in terms of adding more details or lore to a scene. How is this even possible? The only downside is that I wish it had a bigger context window.

DavidAU

Owner Dec 5, 2024

•

edited Dec 5, 2024

Excellent!
I may be able to answer "how possible" -> It seems in testing against other models, there is a lot more processing per token going on.

As a result IQ1S / IQ1M work extremely well , at very high t/s (I am testing low BPW quants / multiple models/archs at the moment).
However, it's T/S at "IQ1s" is about 1/2 the speed of some closer models (parameter wise) and operates at Llama2 13B T/S speed approximately.

IE Mistral 7B models clock in at 100 t/s range ; L3/3.1 around 80 t/s range.

Gemma 2 9B (The Writer) runs at 59 / 55 T/S (IQ1S/IQ1M) on a low end 16 GB Nvidia card. Higher end cards - double that number.

In terms of generational quality (at this low BPW) only 34Bs, 70Bs and 8X7b (MOES) can match / beat it for some tests.

Special mention: Solar models (11B) are the same size (model file size) at Gemma 2 9B, have more layers (48) and operate at 70 + t/s AND close in terms of quality with Gemma 2.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment