Nexesenex
/

Mistral-Large-Instruct-2407-bf16-iMat-CQ-GGUF

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on 30 days ago

Commit

5bc82e0

·

verified ·

1 Parent(s): 7d4bc07

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ IQ2_MR_144L : A 2.66bpw quant. Same features, PPL512 eng is 3.80, PPL512 fr is 3
 IQ2_SR_144L : A 2.58bpw quant. Same features, PPL512 eng is 3.87, PPL512 fr is 3.32. 80k+ context in kv q51/iq4nl bbs64.
-IQ2_XSR_144 : A 2.45bpw quant. Same features, PPL512 eng is 4.07, PPL512 fr is 3.36. 80k+ context in kv q51/iq4nl bbs64.
 -> These last quants are also almost perfectly symetrical for 2 GPU with ts 44-45, and 4 GPUS (for example 4 RTX 3060, 4060ti, or A4000) with ts 22,22,22,23).
 To achieve that, I shrunk a little bit the quantization of some of the last 25% of the layers to match the size of the Q6_K output_weight.

 IQ2_SR_144L : A 2.58bpw quant. Same features, PPL512 eng is 3.87, PPL512 fr is 3.32. 80k+ context in kv q51/iq4nl bbs64.
+IQ2_XSR_144 : A 2.45bpw quant. Same features, PPL512 eng is 4.07, PPL512 fr is 3.36. 95k+ context in kv q51/iq4nl bbs64.
 -> These last quants are also almost perfectly symetrical for 2 GPU with ts 44-45, and 4 GPUS (for example 4 RTX 3060, 4060ti, or A4000) with ts 22,22,22,23).
 To achieve that, I shrunk a little bit the quantization of some of the last 25% of the layers to match the size of the Q6_K output_weight.