Quantized Quality Bad?

#3
by thewise - opened

I can see distinct results when comparing OLLAMA Quantized models v/s TheBloke. Is there a real difference?
OLLAMA Performs way superior

@thewise nope there should not be 0 difference as they are basically the same thing.

Thebloke quantizes model with llama.cpp
OLLAMA uses llama.cpp and they dont do anything special with it so its the same exact thing.

The quality is probably because of either your prompt format(use the correct prompt format that thebloke gives), sampling parameters like temp, top p, top k and which q model you chose.

thebloke provides q2 to q8(lower is worse)

Sign up or log in to comment