File size: 3,320 Bytes
84e3837 a747c65 84e3837 4f670b0 5e16f7a 6dc2008 41238ab 6dc2008 5e16f7a 84e3837 5e16f7a 1b192a7 f6d1b96 6dc2008 5e16f7a b87d0c4 5e16f7a a747c65 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
license: cc-by-nc-2.0
tags:
- imatrix
---
#### nah, and it looks like the tokenizer on the source file's broken anyway. probably the base model too. loves `</s>` for some reason but Yi doesn't use that?
made from [TeeZee/Kyllene-57B-v1.0.q6_k.gguf](/TeeZee/Kyllene-57B-v1.0-GGUF/blob/main/Kyllene-57B-v1.0.q6_K.gguf)
no quants here to download. i did try. make it yourself; the imatrix works and i'm feeling very irritable now. do people not test these things? I know git-lfs hasn't been subject to any QA ever so maybe?
the dataset file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
like this:
```
$ cd exllamav2/conversion/standard_cal_data
$ cat technical.utf8 multilingual.utf8 code.utf8 tiny.utf8 > techmulcodetiny.utf8
```
reference to: [exllamav2/conversion/standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data) and [techmulcodetiny.utf8](./techmulcodetiny.utf8) produce a file that is used by imatrix for 560~ "chunks"
imatrix was run with default sampling settings besides the dataset (i think? i increased the batch number and reduced the batch size so i could cram on more layers but the generation should have been the same in the end)
(someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.) (**UPDATE** found the command I used. use at your peril and obviously fix the paths)
```
imatrix -m Kyllene-57B-v1.0.q6_K.gguf -f ~/exltabbytorcher220/exllamav2/conversion/standard_cal_data/techmulcodetiny.utf8 -o Kyllene-57B-v1.0.q6_K.gguf.imat --verbosity 1 -ngl 50 -cb -t 3 -b 256 --no_mmap
```
51 layers was too many on a 3090 and I had to kill wayland (pro tip: tmux). needless to say you'll probably die if you tried something idiotic like using this on windows
--no_mmap was appropriate on my nigtmare vortex of 32GB DDR4, layered swap,tiny zrams and weird kernel parameters but maybe just omit it.
how-to because i'm grouchy but I did actually want people to have these. Remember to replace IQ2_M (appears only twice, near the end) with whatever you fancy. Q2_K might be more compatible.
```
~]$ git clone https://github.com/ggerganov/llama.cpp
~]$ cd llama.cpp
if you're like me and you break llamas for fun and don't understand cmake: git switch master && git pull; git restore Makefile
otherwise
llama.cpp]$ git pull; make -j
llama.cpp]$ ./quantize --allow-requantize --imatrix Kyllene-57B-v1.0.q6_K.gguf.imatrix INPUT_DIRECTORY/Kyllene-57B-v1.0.q6_K.gguf Kyllene-57B-v1.0.IQ2_M.gguf IQ2_M
```
if your computer has less than 8 cores, add the number of cores to the end of this (there's an invisible 8 by default). and yes, you can just use ./ (llama.cpp) as INPUT_DIRECTORY
# Downloads (eat my ass huggingface yeah just leave the cryptic git lfs error message on the far side of a 3 hour upload over LTE thanks)
no downloads now. ive uploaded 50 gigabytes so far and none of them made it past the great wall of git-lfs
you have the imatrix and the q6, DIY. IQ2_M probably for a 24GB device, IQ3XXS for better with kv offload. |