ProphetOfBostrom
commited on
Commit
•
a875cdb
1
Parent(s):
d0af7d4
Nonsense added. False promises added. Also include BEGGING.md for engagement. Clarif. on penalty for incorrect assumptions about 6.5 bit integers and suggested dietary remediations.
Browse files
README.md
CHANGED
@@ -9,8 +9,9 @@ tags:
|
|
9 |
- 2bit
|
10 |
---
|
11 |
# Kyllene-57B
|
12 |
-
[Kyllene-57B](/TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw GGUF
|
13 |
-
###
|
|
|
14 |
#### There may well be problems with these quants but I'll eat my own entire ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
|
15 |
[imatrix included.](./Kyllene-57B-v1.0.q6_K.gguf.imatrix) generated from [a 900k text file, also included](./techmulcodetiny.utf8)
|
16 |
this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
|
@@ -25,8 +26,11 @@ imatrix run with default sampling settings besides the dataset (i think? i incre
|
|
25 |
(someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.)
|
26 |
|
27 |
# Downloads (eventually)
|
|
|
|
|
|
|
28 |
|
29 |
-
```upload in progress
|
30 |
|
31 |
[IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
|
32 |
- This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
|
@@ -37,4 +41,4 @@ imatrix run with default sampling settings besides the dataset (i think? i incre
|
|
37 |
- briefly existed before I [clobbered](http://catb.org/jargon/html/C/clobber.html) _(verb, transitory)_ it. It ~might~ will be back.
|
38 |
|
39 |
[IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) 3.0<`size`<3.1 BPW
|
40 |
-
- 3090 enjoyers and their friends may want to run this with -nkvo and -ngl 100 ( no K/V offload 100 layers in koboldcpp).
|
|
|
9 |
- 2bit
|
10 |
---
|
11 |
# Kyllene-57B
|
12 |
+
[Kyllene-57B](/TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw [GGUF](/TeeZee/Kyllene-57B-v1.0-GGUF/blob/main/Kyllene-57B-v1.0.q6_K.gguf)-GGUF
|
13 |
+
### Please heart or comment or mail me some anthrax spores if you use these! The download ticker won't work on a repo like this, so there's no feedback. I'm not wasting my time, right?
|
14 |
+
#### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8 and more precision for this would be absurd)
|
15 |
#### There may well be problems with these quants but I'll eat my own entire ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
|
16 |
[imatrix included.](./Kyllene-57B-v1.0.q6_K.gguf.imatrix) generated from [a 900k text file, also included](./techmulcodetiny.utf8)
|
17 |
this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
|
|
|
26 |
(someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.)
|
27 |
|
28 |
# Downloads (eventually)
|
29 |
+
under consideration:
|
30 |
+
- Q2_K_S (imat only but I think compatible with older things. I'm not very sure what this is. )
|
31 |
+
- Q2_K (should be strictly better than [the original](/TeeZee/Kyllene-57B-v1.0-GGUF/blob/main/Kyllene-57B-v1.0.q2_K.gguf) but this may be where my --allow-requantize comes to bite me, we'll see)
|
32 |
|
33 |
+
```upload in progress: (probably done by now)```
|
34 |
|
35 |
[IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
|
36 |
- This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
|
|
|
41 |
- briefly existed before I [clobbered](http://catb.org/jargon/html/C/clobber.html) _(verb, transitory)_ it. It ~might~ will be back.
|
42 |
|
43 |
[IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) 3.0<`size`<3.1 BPW
|
44 |
+
- 3090 enjoyers and their friends may want to run this with -nkvo and -ngl 100 ( no K/V offload 100 layers in koboldcpp). There are 101 layers and the last one becomes distressed if separated from its K/V cache. Invariably chokes your PCIe lanes to death as a survival mechanism. Nature is beautiful.
|