Nonsense added. False promises added. Also include BEGGING.md for engagement. Clarif. on penalty for incorrect assumptions about 6.5 bit integers and suggested dietary remediations.

Browse files

Files changed (1) hide show

README.md +8 -4

README.md CHANGED Viewed

@@ -9,8 +9,9 @@ tags:
 - 2bit
 ---
 # Kyllene-57B
-[Kyllene-57B](/TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw GGUF
-### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8)
 #### There may well be problems with these quants but I'll eat my own entire ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
 [imatrix included.](./Kyllene-57B-v1.0.q6_K.gguf.imatrix) generated from [a 900k text file, also included](./techmulcodetiny.utf8)
 this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
@@ -25,8 +26,11 @@ imatrix run with default sampling settings besides the dataset (i think? i incre
 (someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.)
 # Downloads (eventually)
-```upload in progress:```
 [IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
 - This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
@@ -37,4 +41,4 @@ imatrix run with default sampling settings besides the dataset (i think? i incre
 - briefly existed before I [clobbered](http://catb.org/jargon/html/C/clobber.html) _(verb, transitory)_ it. It ~might~ will be back.
 [IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) 3.0<`size`<3.1 BPW
-- 3090 enjoyers and their friends may want to run this with -nkvo and -ngl 100 ( no K/V offload 100 layers in koboldcpp). there are 101 layers and the last one becomes distressed if separated from its k/v cache. Invariably chokes your PCIe lanes to death as a survival mechanism

 - 2bit
 ---
 # Kyllene-57B
+[Kyllene-57B](/TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw [GGUF](/TeeZee/Kyllene-57B-v1.0-GGUF/blob/main/Kyllene-57B-v1.0.q6_K.gguf)-GGUF
+### Please heart or comment or mail me some anthrax spores if you use these! The download ticker won't work on a repo like this, so there's no feedback. I'm not wasting my time, right?
+#### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8 and more precision for this would be absurd)
 #### There may well be problems with these quants but I'll eat my own entire ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
 [imatrix included.](./Kyllene-57B-v1.0.q6_K.gguf.imatrix) generated from [a 900k text file, also included](./techmulcodetiny.utf8)
 this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
 (someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.)
 # Downloads (eventually)
+under consideration:
+- Q2_K_S (imat only but I think compatible with older things. I'm not very sure what this is. )
+- Q2_K (should be strictly better than [the original](/TeeZee/Kyllene-57B-v1.0-GGUF/blob/main/Kyllene-57B-v1.0.q2_K.gguf) but this may be where my --allow-requantize comes to bite me, we'll see)
+```upload in progress: (probably done by now)```
 [IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
 - This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
 - briefly existed before I [clobbered](http://catb.org/jargon/html/C/clobber.html) _(verb, transitory)_ it. It ~might~ will be back.
 [IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) 3.0<`size`<3.1 BPW
+- 3090 enjoyers and their friends may want to run this with -nkvo and -ngl 100 ( no K/V offload 100 layers in koboldcpp). There are 101 layers and the last one becomes distressed if separated from its K/V cache. Invariably chokes your PCIe lanes to death as a survival mechanism. Nature is beautiful.