ProphetOfBostrom commited on
Commit
6dc2008
1 Parent(s): a9f9128

readme linked up with schedule, calibration info, some attempt at reproducibility, nonsense

Browse files
Files changed (1) hide show
  1. README.md +26 -6
README.md CHANGED
@@ -3,18 +3,38 @@ license: other
3
  license_name: yi-license
4
  license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
5
  tags:
6
- - merge, GGUF, imatrix
 
 
 
7
  ---
8
- ## [Kyllene-57B](/TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw GGUF
 
9
  ### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8)
10
  #### There may well be problems with these quants but I'll eat my own ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
11
- imatrix included. generated from a 900k text file, also included
12
- this file was made by concatenating most of the default exllamav2 calibration data. a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
 
 
 
 
 
 
13
 
 
 
 
 
 
 
14
 
15
  [IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
16
  - This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
17
 
18
- IQ2_M breifly existed before I clobbered (technical term) it. It might be back.
 
 
 
 
19
 
20
- [IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) (<3.1 BPW)
 
3
  license_name: yi-license
4
  license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
5
  tags:
6
+ - merge
7
+ - GGUF
8
+ - imatrix
9
+ - 2bit
10
  ---
11
+ # Kyllene-57B
12
+ [Kyllene-57B](TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw GGUF
13
  ### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8)
14
  #### There may well be problems with these quants but I'll eat my own ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
15
+ [imatrix included.](./Kyllene-57B-v1.0.q6_K.gguf.imatrix) generated from [a 900k text file, also included](./techmulcodetiny.utf8)
16
+ this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
17
+ artefact produced from:
18
+ ```
19
+ $ cd exllamav2/conversion/standard_cal_data
20
+ $ cat technical.utf8 multilingual.utf8 code.utf8 tiny.utf8 >> techmulcodetiny.utf8
21
+ ```
22
+ where: [exllamav2/conversion/standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data) and [techmulcodetiny.utf8](./techmulcodetiny.utf8) produce a file that is used by imatrix for 560~ "chunks"
23
 
24
+ imatrix run with default sampling settings besides the dataset (i think? i increased the batch number and reduced the batch size so i could cram on more layers but the generation should have been the same in the end)
25
+ (someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.)
26
+
27
+ # Downloads (eventually)
28
+
29
+ `upload in progress:`
30
 
31
  [IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
32
  - This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
33
 
34
+ `under reconstruction:`
35
+
36
+ [IQ2_M](./Kyllene-57B-v1.0.IQ2_M.gguf/) 2.7 BPW briefly existed before I [clobbered](http://catb.org/jargon/html/C/clobber.html) _(verb, transitory)_ it. It ~might~ will be back.
37
+
38
+ `upload scheduled next:`
39
 
40
+ [IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) (3.0<s<3.1 BPW)