ProphetOfBostrom
/

Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss_attn-4bit-moe-2bit-HQQ

Text Generation

Mixture of Experts

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

ProphetOfBostrom commited on Jan 13

Commit

aadd35f

•

1 Parent(s): 7d0fc1f

tagged more

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -2,9 +2,17 @@
 license: cc-by-nc-4.0
 library_name: transformers
 pipeline_tag: text-generation
 ---
-## NeverSleep's [Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss) but 17GB at 2BPW+
 ### the other 14 shannons will be remembered. [HQQ quantized](https://mobiusml.github.io/hqq_blog/) to 2 bits with 4 bit attention. Fits on a 3090 with room to grow. Supports full 32k context. I will not combine those assertions.
 The attention tensors are 4 bit because mixtral reuses it for each expert - so it's only adding 0.4 GB and the quality improve dramatically. See [this](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ) but horny and dying of chatml m<|alig>|nant tokenitis.|>

 license: cc-by-nc-4.0
 library_name: transformers
 pipeline_tag: text-generation
+tags:
+ - HQQ
+ - mixtral
+ - moe
+ - quantized
+ - 2bit
 ---
+## NeverSleep's [Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss) 2 bit HQQ quant.
+## 18.2 GB
 ### the other 14 shannons will be remembered. [HQQ quantized](https://mobiusml.github.io/hqq_blog/) to 2 bits with 4 bit attention. Fits on a 3090 with room to grow. Supports full 32k context. I will not combine those assertions.
 The attention tensors are 4 bit because mixtral reuses it for each expert - so it's only adding 0.4 GB and the quality improve dramatically. See [this](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ) but horny and dying of chatml m<|alig>|nant tokenitis.|>