GGUF
Not-For-All-Audiences
nsfw
Inference Endpoints

Great Local Model

#1
by morgul - opened

After testing about 50 different models (including all the new mixtral hotness) and this model (Q5_K) seems to give the best output (and runs amazingly on my M1 Max Macbook, tolerable on my 5900X 3080 PC).

Throwing some light coding tasks and a few other basic things I'd normally throw at ChatGPT it seems like a really solid all-arounder. Did a side by side comparison with two instances of SillyTavern with the same saved chat (200+ messages done with GPT4 initially) and consistently everyone I made try it preferred OpenDolphinMaid to GPT4. Not saying it's a GPT killer or anything, but it's a damned good model with moderate HW requirements.

Just wanted to say thanks, and drop a huge recommendation for anyone wanting a really solid general model that is a bit better at following the script than Noromaid/Hermes/Dolphin on their own. I'd say I prefer XWin-LM-70b, but I prefer OpenDolphinMaid to XWin-LM-13b.

Damn good job. 😁

Owner

Thanks!

I Kneel.
The best model for my system specs that I can fit in my gpus at this quality.

5900X 3080 PC (Q5_K)

I assume 32GB VRAM and 10GB GPU? What context length and layering do you use, in this case?

64GB total system memory (but only 30 used with the model loaded and m_lock). And 10GB VRAM for the GPU.

I found ~18 gpu layers, and 24 cpu threads work well, giving me ~1.5 TTFT ~7 - 8 tok/s.

(On the mac, it's actually hilariously faster, which makes me laugh.)

Sign up or log in to comment