ProphetOfBostrom commited on
Commit
b87d0c4
1 Parent(s): 63fd687

i'm going outside

Browse files
Files changed (1) hide show
  1. README.md +6 -27
README.md CHANGED
@@ -2,17 +2,10 @@
2
  license: other
3
  license_name: yi-license
4
  license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
5
- tags:
6
- - merge
7
- - GGUF
8
- - imatrix
9
- - 2bit
10
  ---
11
- # Kyllene-57B
12
- [Kyllene-57B](/TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw [GGUF](/TeeZee/Kyllene-57B-v1.0-GGUF/blob/main/Kyllene-57B-v1.0.q6_K.gguf)-GGUF
13
- ### Please ❤️like❤️/📧comment📧/💌mail me some anthrax spores💌 if you use these! The download ticker won't work on a repo like this, so there's no feedback. I'm not wasting my time, right?
14
- #### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8 and more precision for this would be absurd). There may well be problems with these quants but I'll eat my own entire ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
15
- [imatrix included.](./Kyllene-57B-v1.0.q6_K.gguf.imatrix) generated from [a 900k text file, also included](./techmulcodetiny.utf8)
16
  this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
17
  artefact produced from:
18
  ```
@@ -24,20 +17,6 @@ where: [exllamav2/conversion/standard_cal_data](https://github.com/turboderp/exl
24
  imatrix run with default sampling settings besides the dataset (i think? i increased the batch number and reduced the batch size so i could cram on more layers but the generation should have been the same in the end)
25
  (someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.)
26
 
27
- # Downloads (eventually)
28
- under consideration:
29
- - Q2_K_S (imat only but I think compatible with older things. I'm not very sure what this is. )
30
- - Q2_K (should be strictly better than [the original](/TeeZee/Kyllene-57B-v1.0-GGUF/blob/main/Kyllene-57B-v1.0.q2_K.gguf) but this may be where my --allow-requantize comes to bite me, we'll see)
31
-
32
- ```upload in progress: (probably done by now)```
33
-
34
- [IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
35
- - This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
36
-
37
- ```upload scheduled in order: (big gpuboys just have to wait)```
38
-
39
- [IQ2_M](./Kyllene-57B-v1.0.IQ2_M.gguf/) 2.7 BPW
40
- - briefly existed before I [clobbered](http://catb.org/jargon/html/C/clobber.html) _(verb, transitory)_ it. It ~might~ will be back.
41
-
42
- [IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) 3.0<`size`<3.1 BPW
43
- - 3090 enjoyers and their friends may want to run this with -nkvo and -ngl 100 ( no K/V offload 100 layers in koboldcpp). There are 101 layers and the last one becomes distressed if separated from its K/V cache. Invariably chokes your PCIe lanes to death as a survival mechanism. Nature is beautiful.
 
2
  license: other
3
  license_name: yi-license
4
  license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
 
 
 
 
 
5
  ---
6
+ nvm looks like the tokenizer on the source file's broken anyway. probably the base model too. loves `</s>` for some reason but Yi doesn't use that?
7
+ no downloads, make it yourself; the imatrix works. i'm deadass sick of this. do people not test these things?
8
+ ---
 
 
9
  this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
10
  artefact produced from:
11
  ```
 
17
  imatrix run with default sampling settings besides the dataset (i think? i increased the batch number and reduced the batch size so i could cram on more layers but the generation should have been the same in the end)
18
  (someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.)
19
 
20
+ # Downloads (eat my ass huggingface yeah just leave the cryptic git lfs error message on the far side of a 3 hour upload over LTE thanks)
21
+ no downloads anymore. ive uploaded 50 gigabytes so far and fucking none of them made it past the great wall of git-lfs
22
+ you have the imatrix and the q6, do it yourself. IQ2_M probably for a 24G IQ3XXS for better with kv offload