I get worse answers than the original model.
I was testing the original model for a specific case, and got very good results. But with the model supplied in this space in GGUF format the answers are not so accurate. Maybe I'm not selecting the right model, which one do you recommend me to use, I tried sqlcoder.Q3_K_M.gguf.
i never tried a Q3 model because of the reported quality loss and the model card states that.
sqlcoder.Q3_K_M.gguf | very small, high quality loss
try any of these
sqlcoder.Q5_K_M.gguf | large, very low quality loss - recommended
sqlcoder.Q6_K.gguf | very large, extremely low quality loss
sqlcoder.Q8_0.gguf | very large, extremely low quality loss - not recommended
I tried qlora custom model derived llama 27b and it was consistent in terms of accuracy and perplexity but when i quantized that model to Q5_K_M.gguf perplexity maintained but accuracy suffered