Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ IQ2_MR_144L : A 2.66bpw quant. Same features, PPL512 eng is 3.80, PPL512 fr is 3
|
|
16 |
|
17 |
IQ2_SR_144L : A 2.58bpw quant. Same features, PPL512 eng is 3.87, PPL512 fr is 3.32. 80k+ context in kv q51/iq4nl bbs64.
|
18 |
|
19 |
-
IQ2_XSR_144 : A 2.45bpw quant. Same features, PPL512 eng is 4.07, PPL512 fr is 3.36.
|
20 |
|
21 |
-> These last quants are also almost perfectly symetrical for 2 GPU with ts 44-45, and 4 GPUS (for example 4 RTX 3060, 4060ti, or A4000) with ts 22,22,22,23).
|
22 |
To achieve that, I shrunk a little bit the quantization of some of the last 25% of the layers to match the size of the Q6_K output_weight.
|
|
|
16 |
|
17 |
IQ2_SR_144L : A 2.58bpw quant. Same features, PPL512 eng is 3.87, PPL512 fr is 3.32. 80k+ context in kv q51/iq4nl bbs64.
|
18 |
|
19 |
+
IQ2_XSR_144 : A 2.45bpw quant. Same features, PPL512 eng is 4.07, PPL512 fr is 3.36. 95k+ context in kv q51/iq4nl bbs64.
|
20 |
|
21 |
-> These last quants are also almost perfectly symetrical for 2 GPU with ts 44-45, and 4 GPUS (for example 4 RTX 3060, 4060ti, or A4000) with ts 22,22,22,23).
|
22 |
To achieve that, I shrunk a little bit the quantization of some of the last 25% of the layers to match the size of the Q6_K output_weight.
|