Nexesenex commited on
Commit
c6b955e
1 Parent(s): e1079b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -4
README.md CHANGED
@@ -7,9 +7,22 @@ With a twist : the model I used come from a third party, and has been tweaked wi
7
 
8
  I don't know who did the job, only that I found this Q4_K_S quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF
9
 
10
- So I made a Q8_0 out of it (best way to requantize after), and requantized it in Q3_K_S and Q2_K for my needs.
11
 
12
- Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config (64GB RAM).
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933
15
 
@@ -34,8 +47,6 @@ More than 1% with linear rope 8 on Q3_K_S
34
  - WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512
35
  - WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
36
 
37
- A Q3_K_M with iMatrix has been added as well, as well as a Q2_K_S.
38
-
39
  -----
40
 
41
  Edit : A Q3_K_XS, new quant offered in LlamaCPP, is otw, with a iMatrix of ctx 32 with 2500 chunks (so, 80,000 tokens)
 
7
 
8
  I don't know who did the job, only that I found this Q4_K_S quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF
9
 
10
+ So I made a Q8_0 out of it (best way to requantize after), and requantized it in :
11
 
12
+ Full offload possible on 48GB VRAM with a huge context size :
13
+
14
+ Q3_K_L
15
+
16
+ Full offload possible on 36GB VRAM with a variable context size (up to 7168 with Q3_K_M, for example)
17
+
18
+ Q3_K_M, Q3_K_S, Q3_K_XS, IQ3_XXS SOTA (which is equivalent to a Q3_K_S with more context!)
19
+ Lower quality : Q2_K, Q2_K_S
20
+
21
+ Full offload possible on 24GB VRAM with a decent context size.
22
+
23
+ IQ2_XS SOTA (filename is partly wrong, b2035 and ch2500 are the real values)
24
+
25
+ The higher ch number, the better the quality.
26
 
27
  And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933
28
 
 
47
  - WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512
48
  - WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
49
 
 
 
50
  -----
51
 
52
  Edit : A Q3_K_XS, new quant offered in LlamaCPP, is otw, with a iMatrix of ctx 32 with 2500 chunks (so, 80,000 tokens)