macadeliccc
/

laser-dolphin-mixtral-2x7b-dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

macadeliccc commited on Feb 10, 2024

Commit

9338c8b

·

verified ·

1 Parent(s): 1d48e74

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -41,8 +41,6 @@ Thanks to user [bartowski](https://huggingface.co/bartowski) we now have exllama
 + [bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2)
-His quantizations represent the first ~13B model with GQA support. Check out his repo for more information!
 | Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
 | ----- | ---- | ------- | ------ | ------ | ------ | ------------ |
 | [8_0](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/8_0) | 8.0 | 8.0 | 13.7 GB | 15.1 GB | 17.2 GB | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
@@ -51,6 +49,8 @@ His quantizations represent the first ~13B model with GQA support. Check out his
 | [4_25](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/4_25) | 4.25 | 6.0 | 8.2 GB | 9.6 GB | 11.7 GB | GPTQ equivalent bits per weight. |
 | [3_5](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/3_5) | 3.5  | 6.0 | 7.0 GB | 8.4 GB | 10.5 GB | Lower quality, not recommended. |
 ### GGUF
 *Current GGUF [Quantizations](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo-GGUF)*

 + [bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2)
 | Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
 | ----- | ---- | ------- | ------ | ------ | ------ | ------------ |
 | [8_0](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/8_0) | 8.0 | 8.0 | 13.7 GB | 15.1 GB | 17.2 GB | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
 | [4_25](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/4_25) | 4.25 | 6.0 | 8.2 GB | 9.6 GB | 11.7 GB | GPTQ equivalent bits per weight. |
 | [3_5](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/3_5) | 3.5  | 6.0 | 7.0 GB | 8.4 GB | 10.5 GB | Lower quality, not recommended. |
+His quantizations represent the first ~13B model with GQA support. Check out his repo for more information!
 ### GGUF
 *Current GGUF [Quantizations](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo-GGUF)*