R136a1 commited on
Commit
64782fb
1 Parent(s): e1cc7f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -13,12 +13,14 @@ language:
13
  ## Model details
14
 
15
  First attempt to quantize a 20B model so it can run on 16GB VRAM with the highest quality possible.
16
- Quantized at 3.18bpw with hb 6
17
 
18
  Perplexity:
19
 
20
  Base = 6.4744
21
 
 
 
22
  3.18 h6 = 6.5705
23
 
24
  Dataset = [wikitext](https://huggingface.co/datasets/wikitext/resolve/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet)
 
13
  ## Model details
14
 
15
  First attempt to quantize a 20B model so it can run on 16GB VRAM with the highest quality possible.
16
+ Quantized at 3.18bpw with hb 6. 8.13bpw also available for those who want it (exl2 is very fast with flash-attention and the quality is (almost) the same with fp16.)
17
 
18
  Perplexity:
19
 
20
  Base = 6.4744
21
 
22
+ 8bpw h8 = 6.4471
23
+
24
  3.18 h6 = 6.5705
25
 
26
  Dataset = [wikitext](https://huggingface.co/datasets/wikitext/resolve/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet)