The best model in this weight class
This is clearly the best model in this weight class. The model shows an astonishing understanding of quite difficult concepts, which are not generally known things that could be “memorized”. Compared to “Madness”, for example, Magnum-v4 is very dry and uninspired.
excellent ; thank you for feedback. I have not tried the new "mag" yet.
I wish somebody could make a 4-BPW exl2 quant for Darkness-12B for vram savings 🙄
@James2313123
Sorry I can't, my machine will just not run Turboderp's EXL2 (!) ; ... you might be able to request one of the EXL2 model makers to make one thou...
@DavidAU
Nvm I painstakingly managed to get Turboderp's EXL2 working on my machine after 2 days of errors lol and created a 4-bpw quant for Darkness-12B in an hour and a half on my 8 GB RTX 4060 Ti . I can now squeeze up to 10k context with great speeds on my gpu with 4bit cache enabled in Tabby 😁
Here's the link if anyone's interested: https://huggingface.co/James2313123/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS-EXL2-4bpw
@James2313123
You the man. ; I will add a link on the repo page.
Curious ... windows 11? or ?
How'd you get it to install ?
@DavidAU
Sure. I'm on windows 10. So basically ummm... I got the setup tutorial from reddit two days ago. https://www.reddit.com/r/LocalLLaMA/comments/1aybeji/exl2_quantization_for_dummies/
Now the problem was, I was getting 'Unable to find cl.exe' and 'Cuda_Home environment' errors all off the time and Cuda toolkit wouldn't detect build tools for visual studio integration during installation even after manually adding it to environment path variables. So after much digging I found a solution. I had to manually go into 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build' and run 'vcvarsall.bat amd64' in cmd and that fixed my 'Cl.exe' issue and then instead of installing Cuda 12.6, I installed Cuda 11.8 which worked. And that's how I got Turboderp's EXL2 working hehe.
@James2313123
Yes, cuda issues are always fun. ; I am stuck with cuda 12.2x at the moment - everything might break otherwise (!).
Cuda and toolkit are so buggy.