macadeliccc
/

laser-dolphin-mixtral-2x7b-dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

macadeliccc commited on Feb 10

Commit

1d48e74

•

1 Parent(s): 38bcb59

Update README.md

Files changed (1) hide show

README.md +19 -7

README.md CHANGED Viewed

@@ -33,9 +33,23 @@ This model is a medium-sized MoE implementation based on [cognitivecomputations/
 ## Quantizations
-**These Quants will result in unpredicted behavior. New quants are available as I have updated the model**
-Quatizations provided by [TheBloke](https://huggingface.co/TheBloke/laser-dolphin-mixtral-2x7b-dpo-GGUF)
 ### GGUF
@@ -45,13 +59,11 @@ Quatizations provided by [TheBloke](https://huggingface.co/TheBloke/laser-dolphi
 *Current AWQ [Quantizations](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo-AWQ)
-# ExLlamav2
-Thanks to user [bartowski](https://huggingface.co/bartowski) we now have exllamav2 quantizations in 3.5 through 8 bpw. They are available here:
-+ [bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2)
-His quantizations represent the first ~13B model with GQA support. Check out his repo for more information!
 ## HF Spaces
 + GGUF chat available [here](https://huggingface.co/spaces/macadeliccc/laser-dolphin-mixtral-chat-GGUF)

 ## Quantizations
+### ExLlamav2
+_These are the recommended quantizations for users that are running the model on GPU_
+Thanks to user [bartowski](https://huggingface.co/bartowski) we now have exllamav2 quantizations in 3.5 through 8 bpw. They are available here:
++ [bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2)
+His quantizations represent the first ~13B model with GQA support. Check out his repo for more information!
+| Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
+| ----- | ---- | ------- | ------ | ------ | ------ | ------------ |
+| [8_0](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/8_0) | 8.0 | 8.0 | 13.7 GB | 15.1 GB | 17.2 GB | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
+| [6_5](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/6_5) | 6.5  | 8.0 | 11.5 GB | 12.9 GB | 15.0 GB | Near unquantized performance at vastly reduced size, **recommended**. |
+| [5_0](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/5_0) | 5.0  | 6.0 | 9.3 GB | 10.7 GB | 12.8 GB | Slightly lower quality vs 6.5, great for 12gb cards with 16k context. |
+| [4_25](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/4_25) | 4.25 | 6.0 | 8.2 GB | 9.6 GB | 11.7 GB | GPTQ equivalent bits per weight. |
+| [3_5](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/3_5) | 3.5  | 6.0 | 7.0 GB | 8.4 GB | 10.5 GB | Lower quality, not recommended. |
 ### GGUF
 *Current AWQ [Quantizations](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo-AWQ)
+### TheBloke
+**These Quants will result in unpredicted behavior. New quants are available as I have updated the model**
+Quatizations provided by [TheBloke](https://huggingface.co/TheBloke/laser-dolphin-mixtral-2x7b-dpo-GGUF)
 ## HF Spaces
 + GGUF chat available [here](https://huggingface.co/spaces/macadeliccc/laser-dolphin-mixtral-chat-GGUF)