ProphetOfBostrom commited on
Commit
ea421ff
1 Parent(s): 41238ab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
  # Kyllene-57B
12
  [Kyllene-57B](TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw GGUF
13
  ### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8)
14
- #### There may well be problems with these quants but I'll eat my own ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
15
  [imatrix included.](./Kyllene-57B-v1.0.q6_K.gguf.imatrix) generated from [a 900k text file, also included](./techmulcodetiny.utf8)
16
  this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
17
  artefact produced from:
@@ -26,15 +26,15 @@ imatrix run with default sampling settings besides the dataset (i think? i incre
26
 
27
  # Downloads (eventually)
28
 
29
- `upload in progress:`
30
 
31
  [IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
32
  - This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
33
 
34
- `under reconstruction:`
35
 
36
- [IQ2_M](./Kyllene-57B-v1.0.IQ2_M.gguf/) 2.7 BPW briefly existed before I [clobbered](http://catb.org/jargon/html/C/clobber.html) _(verb, transitory)_ it. It ~might~ will be back.
 
37
 
38
- `upload scheduled next:`
39
-
40
- [IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) (3.0<s<3.1 BPW)
 
11
  # Kyllene-57B
12
  [Kyllene-57B](TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw GGUF
13
  ### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8)
14
+ #### There may well be problems with these quants but I'll eat my own entire ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
15
  [imatrix included.](./Kyllene-57B-v1.0.q6_K.gguf.imatrix) generated from [a 900k text file, also included](./techmulcodetiny.utf8)
16
  this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
17
  artefact produced from:
 
26
 
27
  # Downloads (eventually)
28
 
29
+ ```upload in progress:```
30
 
31
  [IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
32
  - This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
33
 
34
+ ```upload scheduled in order: (big gpuboys just have to wait)```
35
 
36
+ [IQ2_M](./Kyllene-57B-v1.0.IQ2_M.gguf/) 2.7 BPW
37
+ - briefly existed before I [clobbered](http://catb.org/jargon/html/C/clobber.html) _(verb, transitory)_ it. It ~might~ will be back.
38
 
39
+ [IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) 3.0<`size`<3.1 BPW
40
+ - 3090 enjoyers and their friends may want to run this with -nkvo and -ngl 100 ( no K/V offload 100 layers in koboldcpp). there are 101 layers and the last one becomes distressed if separated from its k/v cache. Invariably chokes your PCIe lanes to death as a survival mechanism