Edit model card

My own (ZeroWw) quantizations.
output and embed tensors quantized to f16.
all other tensors quantized to q5_k or q6_k.

Result:
both f16.q6 and f16.q5 are smaller than q8_0 standard quantization
and they perform as well as the pure f16.

Downloads last month
3,707
GGUF
Model size
8.03B params
Architecture
llama
Unable to determine this model's library. Check the docs .