GGML vs GGUF vs GPTQ

#2
by HemanthSai7 - opened

I'm new to quantization stuff. It'd be very helpful if you could explain the difference between these three types. Even a blog would be helpful. Thanks

GPTQ is a specific format for GPU only.

GGML is designed for CPU and Apple M series but can also offload some layers on the GPU

GGUF: https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md
GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.

HemanthSai7 changed discussion status to closed

Sign up or log in to comment