Gotcha, yeah so that GGUF is made using an imatrix to measure the affect of quantizing weights to convert the original model weights with some awareness of which to target more aggressively

ExllamaV2 uses a similar method already of measuring the affect of quantization on the weights, the measurement.json you see in the main repo is the equivalent of the imatrix that llama.cpp creates

point is, this basically already is an "imatrix" quant so you're good to go :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment