Nexesenex commited on
Commit
5bc82e0
·
verified ·
1 Parent(s): 7d4bc07

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ IQ2_MR_144L : A 2.66bpw quant. Same features, PPL512 eng is 3.80, PPL512 fr is 3
16
 
17
  IQ2_SR_144L : A 2.58bpw quant. Same features, PPL512 eng is 3.87, PPL512 fr is 3.32. 80k+ context in kv q51/iq4nl bbs64.
18
 
19
- IQ2_XSR_144 : A 2.45bpw quant. Same features, PPL512 eng is 4.07, PPL512 fr is 3.36. 80k+ context in kv q51/iq4nl bbs64.
20
 
21
  -> These last quants are also almost perfectly symetrical for 2 GPU with ts 44-45, and 4 GPUS (for example 4 RTX 3060, 4060ti, or A4000) with ts 22,22,22,23).
22
  To achieve that, I shrunk a little bit the quantization of some of the last 25% of the layers to match the size of the Q6_K output_weight.
 
16
 
17
  IQ2_SR_144L : A 2.58bpw quant. Same features, PPL512 eng is 3.87, PPL512 fr is 3.32. 80k+ context in kv q51/iq4nl bbs64.
18
 
19
+ IQ2_XSR_144 : A 2.45bpw quant. Same features, PPL512 eng is 4.07, PPL512 fr is 3.36. 95k+ context in kv q51/iq4nl bbs64.
20
 
21
  -> These last quants are also almost perfectly symetrical for 2 GPU with ts 44-45, and 4 GPUS (for example 4 RTX 3060, 4060ti, or A4000) with ts 22,22,22,23).
22
  To achieve that, I shrunk a little bit the quantization of some of the last 25% of the layers to match the size of the Q6_K output_weight.