Great model!

#1
by Altotas - opened

I used this model for a week or so, and it worked super well as a writing assistant, helping me with metaphors and adding flourishes to text while keeping it in line with my own style. I tried Gemma-The-Writer-J.GutenBerg-10B too, but that one felt like a downgrade so I'll stay with good old Gemma Writer 9B for now.

Owner

@Altotas

Excellent; thank you for the feedback.

J.Gutenberg is a little over the top, and with brainstorm may affect some generation / instructions.
That being said you may want to try the new uncensored version of "Gemma The Writer - Restless Quill" (released yesterday) .

https://huggingface.co/DavidAU/Gemma-The-Writer-N-Restless-Quill-10B-Uncensored-GGUF

However, also at this repo prose control (and examples) are shown which may help with "Gemma The Writer 9B" you are already using.

So far, one of my favorite models. It has a pretty good understanding of what I want it to generate. When it comes to vivid descriptions, it outperforms its more serious competitors with 20B or more, in terms of adding more details or lore to a scene. How is this even possible? The only downside is that I wish it had a bigger context window.

Excellent!
I may be able to answer "how possible" -> It seems in testing against other models, there is a lot more processing per token going on.

As a result IQ1S / IQ1M work extremely well , at very high t/s (I am testing low BPW quants / multiple models/archs at the moment).
However, it's T/S at "IQ1s" is about 1/2 the speed of some closer models (parameter wise) and operates at Llama2 13B T/S speed approximately.

IE Mistral 7B models clock in at 100 t/s range ; L3/3.1 around 80 t/s range.

Gemma 2 9B (The Writer) runs at 59 / 55 T/S (IQ1S/IQ1M) on a low end 16 GB Nvidia card. Higher end cards - double that number.

In terms of generational quality (at this low BPW) only 34Bs, 70Bs and 8X7b (MOES) can match / beat it for some tests.

Special mention: Solar models (11B) are the same size (model file size) at Gemma 2 9B, have more layers (48) and operate at 70 + t/s AND close in terms of quality with Gemma 2.

Sign up or log in to comment