internlm2-limarp-chat-20b.Q4_K_S_imx.gguf vs internlm2-limarp-chat-20b.Q4_K_S.gguf

#2
by Alastar-Smith - opened

Hello!

Really like this model!
Can you please explain the difference between IMX vs regular GGUFs?
I've googled it but nothing useful.

Thank you in advance!

Hello! I'm glad you like my model.
IMX here refers to quantizations done using the recent imatrix feature from llama.cpp. They should perform slightly better, while being the same size.
You can read more about this feature in these pull requests:
https://github.com/ggerganov/llama.cpp/pull/4861
https://github.com/ggerganov/llama.cpp/pull/4930

Hello! I'm glad you like my model.
IMX here refers to quantizations done using the recent imatrix feature from llama.cpp. They should perform slightly better, while being the same size.
You can read more about this feature in these pull requests:
https://github.com/ggerganov/llama.cpp/pull/4861
https://github.com/ggerganov/llama.cpp/pull/4930

Thank you for answer! Got it! Sounds like a nice improvement over an old GGUF!

Sign up or log in to comment