Nexesenex
/

WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant.GGUF

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on Feb 2, 2024

Commit

c6b955e

·

verified ·

1 Parent(s): e1079b9

Update README.md

Files changed (1) hide show

README.md +15 -4

README.md CHANGED Viewed

@@ -7,9 +7,22 @@ With a twist : the model I used come from a third party, and has been tweaked wi
 I don't know who did the job, only that I found this Q4_K_S quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF
-So I made a Q8_0 out of it (best way to requantize after), and requantized it in Q3_K_S and Q2_K for my needs.
-Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config (64GB RAM).
 And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933
@@ -34,8 +47,6 @@ More than 1% with linear rope 8 on Q3_K_S
 - WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512
 - WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
-A Q3_K_M with iMatrix has been added as well, as well as a Q2_K_S.
 -----
 Edit : A Q3_K_XS, new quant offered in LlamaCPP, is otw, with a iMatrix of ctx 32 with 2500 chunks (so, 80,000 tokens)

 I don't know who did the job, only that I found this Q4_K_S quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF
+So I made a Q8_0 out of it (best way to requantize after), and requantized it in :
+Full offload possible on 48GB VRAM with a huge context size :
+    Q3_K_L
+Full offload possible on 36GB VRAM with a variable context size (up to 7168 with Q3_K_M, for example)
+    Q3_K_M, Q3_K_S, Q3_K_XS, IQ3_XXS SOTA (which is equivalent to a Q3_K_S with more context!)
+    Lower quality : Q2_K, Q2_K_S
+Full offload possible on 24GB VRAM with a decent context size.
+    IQ2_XS SOTA (filename is partly wrong, b2035 and ch2500 are the real values)
+The higher ch number, the better the quality.
 And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933
 - WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512
 - WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
 -----
 Edit : A Q3_K_XS, new quant offered in LlamaCPP, is otw, with a iMatrix of ctx 32 with 2500 chunks (so, 80,000 tokens)