These are GGUF quantizations of Aurelian v0.5 70B 32K, an interim checkpoint before v1.0. Download the appropriate quantization you need.

Please see the above page for more details, and please use the instruct prompt format and rope scale (linear scaling = 8) described in the above link, no matter what context length you actually use.

These quantizations are untested. Please feed back if you run into issues.

Quants:

  • Q6_K (split, needs concatenation, see below)
  • Q5_K_M (largest without file splitting)
  • Q4_K_M
  • Q3_K_M (with importance matrix)
  • Q3_K_S (with importance matrix)
  • Q2_K (with importance matrix)
  • IQ2_XS (with importance matrix)

Ongoing experimentation/discussion on SOTA lower-bit quants for this model: here Please wait for the dust to settle if you just want a good <4 bit quant.

The Q6_K file alone requires concatenation before use.

Linux & macOS:

cat aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-* > aurelian-v0.5-70b-rope8-32K.Q6_K.gguf && rm aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-*

Windows:

COPY /B aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-a + aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-b aurelian-v0.5-70b-rope8-32K.Q6_K.gguf
del aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-a aurelian-v0.5-70b-rope8-32K.Q6_K.gguf-split-b
Downloads last month
24
GGUF
Model size
69B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

Inference API
Unable to determine this model's library. Check the docs .