Files changed (1) hide show
  1. README.md +31 -30
README.md CHANGED
@@ -1,42 +1,43 @@
1
  ---
2
- license: cc-by-nc-2.0
 
 
3
  tags:
 
 
4
  - imatrix
 
5
  ---
6
- #### nah, and it looks like the tokenizer on the source file's broken anyway. probably the base model too. loves `</s>` for some reason but Yi doesn't use that?
7
- made from [TeeZee/Kyllene-57B-v1.0.q6_k.gguf](/TeeZee/Kyllene-57B-v1.0-GGUF/blob/main/Kyllene-57B-v1.0.q6_K.gguf)
8
-
9
- no quants here to download. i did try. make it yourself; the imatrix works and i'm feeling very irritable now. do people not test these things? I know git-lfs hasn't been subject to any QA ever so maybe?
10
-
11
-
12
- the dataset file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
13
- like this:
14
  ```
15
  $ cd exllamav2/conversion/standard_cal_data
16
  $ cat technical.utf8 multilingual.utf8 code.utf8 tiny.utf8 > techmulcodetiny.utf8
17
  ```
18
- reference to: [exllamav2/conversion/standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data) and [techmulcodetiny.utf8](./techmulcodetiny.utf8) produce a file that is used by imatrix for 560~ "chunks"
19
 
20
- imatrix was run with default sampling settings besides the dataset (i think? i increased the batch number and reduced the batch size so i could cram on more layers but the generation should have been the same in the end)
21
- (someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.) (**UPDATE** found the command I used. use at your peril and obviously fix the paths)
22
- ```
23
- imatrix -m Kyllene-57B-v1.0.q6_K.gguf -f ~/exltabbytorcher220/exllamav2/conversion/standard_cal_data/techmulcodetiny.utf8 -o Kyllene-57B-v1.0.q6_K.gguf.imat --verbosity 1 -ngl 50 -cb -t 3 -b 256 --no_mmap
24
- ```
25
- 51 layers was too many on a 3090 and I had to kill wayland (pro tip: tmux). needless to say you'll probably die if you tried something idiotic like using this on windows
26
- --no_mmap was appropriate on my nigtmare vortex of 32GB DDR4, layered swap,tiny zrams and weird kernel parameters but maybe just omit it.
27
 
28
- how-to because i'm grouchy but I did actually want people to have these. Remember to replace IQ2_M (appears only twice, near the end) with whatever you fancy. Q2_K might be more compatible.
29
- ```
30
- ~]$ git clone https://github.com/ggerganov/llama.cpp
31
- ~]$ cd llama.cpp
32
- if you're like me and you break llamas for fun and don't understand cmake: git switch master && git pull; git restore Makefile
33
- otherwise
34
- llama.cpp]$ git pull; make -j
35
- llama.cpp]$ ./quantize --allow-requantize --imatrix Kyllene-57B-v1.0.q6_K.gguf.imatrix INPUT_DIRECTORY/Kyllene-57B-v1.0.q6_K.gguf Kyllene-57B-v1.0.IQ2_M.gguf IQ2_M
36
- ```
 
 
37
 
38
- if your computer has less than 8 cores, add the number of cores to the end of this (there's an invisible 8 by default). and yes, you can just use ./ (llama.cpp) as INPUT_DIRECTORY
 
39
 
40
- # Downloads (eat my ass huggingface yeah just leave the cryptic git lfs error message on the far side of a 3 hour upload over LTE thanks)
41
- no downloads now. ive uploaded 50 gigabytes so far and none of them made it past the great wall of git-lfs
42
- you have the imatrix and the q6, DIY. IQ2_M probably for a 24GB device, IQ3XXS for better with kv offload.
 
1
  ---
2
+ license: other
3
+ license_name: yi-license
4
+ license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
5
  tags:
6
+ - merge
7
+ - GGUF
8
  - imatrix
9
+ - 2bit
10
  ---
11
+ # Kyllene-57B
12
+ [Kyllene-57B](/TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw [GGUF](/TeeZee/Kyllene-57B-v1.0-GGUF/blob/main/Kyllene-57B-v1.0.q6_K.gguf)-GGUF
13
+ ### Please ❤️like❤️/📧comment📧/💌mail me some anthrax spores💌 if you use these! The download ticker won't work on a repo like this, so there's no feedback. I'm not wasting my time, right?
14
+ #### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8 and more precision for this would be absurd). There may well be problems with these quants but I'll eat my own entire ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
15
+ [imatrix included.](./Kyllene-57B-v1.0.q6_K.gguf.imatrix) generated from [a 900k text file, also included](./techmulcodetiny.utf8)
16
+ this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
17
+ artefact produced from:
 
18
  ```
19
  $ cd exllamav2/conversion/standard_cal_data
20
  $ cat technical.utf8 multilingual.utf8 code.utf8 tiny.utf8 > techmulcodetiny.utf8
21
  ```
22
+ where: [exllamav2/conversion/standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data) and [techmulcodetiny.utf8](./techmulcodetiny.utf8) produce a file that is used by imatrix for 560~ "chunks"
23
 
24
+ imatrix run with default sampling settings besides the dataset (i think? i increased the batch number and reduced the batch size so i could cram on more layers but the generation should have been the same in the end)
25
+ (someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.)
 
 
 
 
 
26
 
27
+ # Downloads (eventually)
28
+ under consideration:
29
+ - Q2_K_S (imat only but I think compatible with older things. I'm not very sure what this is. )
30
+ - Q2_K (should be strictly better than [the original](/TeeZee/Kyllene-57B-v1.0-GGUF/blob/main/Kyllene-57B-v1.0.q2_K.gguf) but this may be where my --allow-requantize comes to bite me, we'll see)
31
+
32
+ ```upload in progress: (probably done by now)```
33
+
34
+ [IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
35
+ - This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
36
+
37
+ ```upload scheduled in order: (big gpuboys just have to wait)```
38
 
39
+ [IQ2_M](./Kyllene-57B-v1.0.IQ2_M.gguf/) 2.7 BPW
40
+ - briefly existed before I [clobbered](http://catb.org/jargon/html/C/clobber.html) _(verb, transitory)_ it. It ~might~ will be back.
41
 
42
+ [IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) 3.0<`size`<3.1 BPW
43
+ - 3090 enjoyers and their friends may want to run this with -nkvo and -ngl 100 ( no K/V offload 100 layers in koboldcpp). There are 101 layers and the last one becomes distressed if separated from its K/V cache. Invariably chokes your PCIe lanes to death as a survival mechanism. Nature is beautiful.