Edit model card

GGUF quants for mistralai/Mistral-7B-v0.1 using llama.cpp

Terms of Use: Please check the original model

cthulhu

Quants

  • q2_k: Uses Q4_K for the attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.
  • q3_k_s: Uses Q3_K for all tensors
  • q3_k_m: Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K
  • q3_k_l: Uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K
  • q4_0: Original quant method, 4-bit.
  • q4_1: Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.
  • q4_k_s: Uses Q4_K for all tensors
  • q4_k_m: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K
  • q5_0: Higher accuracy, higher resource usage and slower inference.
  • q5_1: Even higher accuracy, resource usage and slower inference.
  • q5_k_s: Uses Q5_K for all tensors
  • q5_k_m: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K
  • q6_k: Uses Q8_K for all tensors
  • q8_0: Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.
Downloads last month
272
GGUF
Model size
7.24B params
Architecture
llama
Unable to determine this model's library. Check the docs .

Collection including neopolita/mistral-7b-v0.1-gguf