grimulkan/aurelian-v0.5-70b-rope8-32K_GGUF

These are GGUF quantizations of Aurelian v0.5 70B 32K, an interim checkpoint before v1.0. Download the appropriate quantization you need.

Please see the above page for more details, and please use the instruct prompt format and rope scale (linear scaling = 8) described in the above link, no matter what context length you actually use.

These quantizations are untested. Please feed back if you run into issues.

Quants:

Q6_K (split, needs concatenation, see below)
Q5_K_M (largest without file splitting)
Q4_K_M
Q3_K_M (with importance matrix)
Q3_K_S (with importance matrix)
Q2_K (with importance matrix)
IQ2_XS (with importance matrix)

Ongoing experimentation/discussion on SOTA lower-bit quants for this model: here Please wait for the dust to settle if you just want a good <4 bit quant.

The Q6_K file alone requires concatenation before use.

Linux & macOS:

cat aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-* > aurelian-v0.5-70b-rope8-32K.Q6_K.gguf && rm aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-*

Windows:

COPY /B aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-a + aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-b aurelian-v0.5-70b-rope8-32K.Q6_K.gguf
del aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-a aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-b