R136a1
/

MXLewd-L2-20B-exl2

Text Generation

Not-For-All-Audiences

nsfw

Inference Endpoints

Model card Files Files and versions Community

R136a1 commited on Oct 17, 2023

Commit

64782fb

•

1 Parent(s): e1cc7f4

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -13,12 +13,14 @@ language:
 ## Model details
 First attempt to quantize a 20B model so it can run on 16GB VRAM with the highest quality possible.
-Quantized at 3.18bpw with hb 6
 Perplexity:
 Base = 6.4744
 3.18 h6 = 6.5705
 Dataset = [wikitext](https://huggingface.co/datasets/wikitext/resolve/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet)

 ## Model details
 First attempt to quantize a 20B model so it can run on 16GB VRAM with the highest quality possible.
+Quantized at 3.18bpw with hb 6. 8.13bpw also available for those who want it (exl2 is very fast with flash-attention and the quality is (almost) the same with fp16.)
 Perplexity:
 Base = 6.4744
+8bpw h8 = 6.4471
 3.18 h6 = 6.5705
 Dataset = [wikitext](https://huggingface.co/datasets/wikitext/resolve/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet)