Nexesenex
/

WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant.GGUF

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on Jan 21

Commit

dd5d800

•

1 Parent(s): dfe0b4d

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -11,6 +11,8 @@ So I made a Q8_0 out of it (best way to requantize after), and requantized it in
 Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config (64GB RAM).
 -----
 Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens), and it lowers the perplexity by :

 Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config (64GB RAM).
+And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933
 -----
 Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens), and it lowers the perplexity by :