ProphetOfBostrom commited on
Commit
aadd35f
1 Parent(s): 7d0fc1f

tagged more

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -2,9 +2,17 @@
2
  license: cc-by-nc-4.0
3
  library_name: transformers
4
  pipeline_tag: text-generation
 
 
 
 
 
 
 
5
  ---
6
 
7
- ## NeverSleep's [Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss) but 17GB at 2BPW+
 
8
  ### the other 14 shannons will be remembered. [HQQ quantized](https://mobiusml.github.io/hqq_blog/) to 2 bits with 4 bit attention. Fits on a 3090 with room to grow. Supports full 32k context. I will not combine those assertions.
9
  The attention tensors are 4 bit because mixtral reuses it for each expert - so it's only adding 0.4 GB and the quality improve dramatically. See [this](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ) but horny and dying of chatml m<|alig>|nant tokenitis.|>
10
 
 
2
  license: cc-by-nc-4.0
3
  library_name: transformers
4
  pipeline_tag: text-generation
5
+ tags:
6
+ - HQQ
7
+ - mixtral
8
+ - moe
9
+ - quantized
10
+ - 2bit
11
+
12
  ---
13
 
14
+ ## NeverSleep's [Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss) 2 bit HQQ quant.
15
+ ## 18.2 GB
16
  ### the other 14 shannons will be remembered. [HQQ quantized](https://mobiusml.github.io/hqq_blog/) to 2 bits with 4 bit attention. Fits on a 3090 with room to grow. Supports full 32k context. I will not combine those assertions.
17
  The attention tensors are 4 bit because mixtral reuses it for each expert - so it's only adding 0.4 GB and the quality improve dramatically. See [this](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ) but horny and dying of chatml m<|alig>|nant tokenitis.|>
18