Nexesenex
/

WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant.GGUF

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on Jan 21, 2024

Commit

7a8be7e

·

verified ·

1 Parent(s): bcbeae9

Update README.md

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -13,6 +13,20 @@ Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config
 -----
 Benchs of the original Q4_K_S quant I found :
 Rope 8 10000

 -----
+Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens), and it lowers the perplexity by :
+More than 3% in Rope 8 on Q2_K
+WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,6.2489,512,
+WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.0482,512
+More than 1% with Rope 8 on Q3_K_S
+WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512
+WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
+-----
 Benchs of the original Q4_K_S quant I found :
 Rope 8 10000