metadata
{}
GGUF quants for Rakuten/RakutenAI-7B-chat using llama.cpp
Terms of Use: Please check the original model
Quants
q2_k
: Uses Q4_K for the attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.q3_k_s
: Uses Q3_K for all tensorsq3_k_m
: Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_Kq3_k_l
: Uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_Kq4_0
: Original quant method, 4-bit.q4_1
: Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.q4_k_s
: Uses Q4_K for all tensorsq4_k_m
: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_Kq5_0
: Higher accuracy, higher resource usage and slower inference.q5_1
: Even higher accuracy, resource usage and slower inference.q5_k_s
: Uses Q5_K for all tensorsq5_k_m
: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_Kq6_k
: Uses Q8_K for all tensorsq8_0
: Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.