Quantized Quality Bad?

by thewise - opened Feb 21

Feb 21

I can see distinct results when comparing OLLAMA Quantized models v/s TheBloke. Is there a real difference?
OLLAMA Performs way superior

YaTharThShaRma999

Feb 21

•

edited Feb 21

@thewise nope there should not be 0 difference as they are basically the same thing.

Thebloke quantizes model with llama.cpp
OLLAMA uses llama.cpp and they dont do anything special with it so its the same exact thing.

The quality is probably because of either your prompt format(use the correct prompt format that thebloke gives), sampling parameters like temp, top p, top k and which q model you chose.

thebloke provides q2 to q8(lower is worse)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment