General discussion and feedback.

#1
by Lewdiculous - opened

@Cran-May - Feel free to perform testing of the requested quants and to share the feedback. Best of luck in your tuning.

Lewdiculous pinned discussion

Which version should I download when i want to use it in LM Studio(or similiar app) on a notebook with these specs: i5 12500H, 64gb RAM, 3050RTX with 4GB VRAM?

Fastest one be:
https://huggingface.co/Lewdiculous/firefly-gemma-7b-GGUF-IQ-Imatrix/blob/main/firefly-gemma-7b-IQ2_XXS-imatrix.gguf

But this one should offer a good balance of quality at the cost of speed:
https://huggingface.co/Lewdiculous/firefly-gemma-7b-GGUF-IQ-Imatrix/blob/main/firefly-gemma-7b-IQ4_XS-imatrix.gguf

I recommend a Q5 if quality is very important:
https://huggingface.co/Lewdiculous/firefly-gemma-7b-GGUF-IQ-Imatrix/blob/main/firefly-gemma-7b-Q5_K_S-imatrix.gguf

I'm still waiting for @Cran-May 's testing to make sure everything is okay, if you can test and provide feedback that'd also be useful.

My first impression after trying the Q6K and IQ4XS versions is that it works ok but didn't blow my mind. You definitely have better models in your collection. I'll keep testing with other prompts and see if they work better.

You definitely have better models in your collection.

Ah, yeah so...

firefly-gemma-7b is trained based on gemma-7b to act as a helpful and harmless AI assistant.

I actually didn't even add this one to the Collection, haha, since it's kind of not like the rest.

This model isn't quite like the others, for example, Eris and InfinityRP benchmark much higher. This was a request for a general use assistant model based on the Google/Gemma arch, not really meant for what we usually do, it's more of a model you'd deploy to run locally as a personal assistant, not really that useful for myself, haha.

Generally speaking stay tuned in the Favorites collection:

https://huggingface.co/collections/Lewdiculous/personal-favorites-65dcbe240e6ad245510519aa

That's where I'll group the more outstanding models. Recently I think there's a lot of experimentation going on, but these:

  1. https://huggingface.co/Lewdiculous/InfinityRP-v1-7B-GGUF-IQ-Imatrix

  2. https://huggingface.co/Lewdiculous/Persephone_7B-GGUF-IQ-Imatrix

  3. https://huggingface.co/Lewdiculous/Eris-Lelanacles-7b-GGUF-IQ-Imatrix

  4. https://huggingface.co/Lewdiculous/Infinitely-Laydiculous-7B-GGUF-IQ-Imatrix

Should bench/perform a lot better, and are more in line with my own use case.

Cheers and thanks for the feedback, I'm just glad the model isn't broken, as I'm not used to making quants for Gemma based models.

Oh yeah, that explains it. I just assumed it was supposed to be an RP model too because firefly is such cute name ;)

@WesPro - Oh, you can be sure it would be a firefly anime girl in that case.

:'3

Sign up or log in to comment