AWQ model performs significantly worse than the GPTQ model
I had a discussion on the original model card page about issues I was having prompting this model.
https://huggingface.co/VAGOsolutions/SauerkrautLM-Mixtral-8x7B-Instruct/discussions/2
After many different tests we came to the conclusion that the issue was because I was using the AWQ quantization.
A lot of the time the issue was specifically that the model had a large tendency to continue generating more text after already generating the requested information.
If anyone else is having similar issues know that it is likely the quantization and not the model itself.
I don't know if this behavior is specific to this model or if it is a general case for most AWQ quantizations, if anyone knows I would be intrigued to know!
@martinkozle i think this is more about the eos token instead of the quantization format.
For some reason, the generation doesn’t stop at the eos token. It might be a problem with the library or just some setting with the eos token
The texts after the answer is very much like text after the eos token like it’s usually completely random unrelelated things