Could you make a exl2 quant for the weighted/imatrix version?

#1
by mjh657 - opened

Just wondering.

What do you mean exactly? This is the exl2 version of the model :)

mradermacher/IceCaffeLatteRP-7b-i1-GGUF this version. Sorry if this makes no since I don't know a whole lot about how these are made.

Gotcha, yeah so that GGUF is made using an imatrix to measure the affect of quantizing weights to convert the original model weights with some awareness of which to target more aggressively

ExllamaV2 uses a similar method already of measuring the affect of quantization on the weights, the measurement.json you see in the main repo is the equivalent of the imatrix that llama.cpp creates

point is, this basically already is an "imatrix" quant so you're good to go :)

Sign up or log in to comment