Are the .dat Imatrix files needed alongside the models?

#7
by Xonaz81 - opened

Since the quants are still in ongoing development and so new, many programs do not even include the latest llama.cpp - but I was wondering; do we need to provide llama.cpp with both the model file + the .dat (imatrix) file for better perplexity or is this not how this works?

p.s. - opened this question in a new discussion to keep it organised.

You can make your own quants with the imatrix files. should be possible to make 3bit, 4bit etc ggufs.

KnutJaegersberg changed discussion status to closed

you don't need the imatrix files for inference. if you want to makeyour own quants with them, you have to convert the hf model to a gguf and then you can use the imatrix files to make better quants.

@KnutJaegersberg Okay thanks, that is what I figured. I got very inconsistent results but I now understand why. Many uploaders of these new quant types either don't use an importance matrix or made it the wrong way, hence the perplexity is really bad and so will model responses. How did you create the imatrix files?

merely used 100 chunks wtih random words. sometimes that seems to result in bad output, but it is rather infrequent and it does not appear to bias output like normal datasets like wikitext.
that said, I guess they can be even better if one uses the specific fine tuning datasets of each model, so the bias is "useful".

for the german models I simply used the german evol instruct dataset. however, with larger datasets, making those imatrices can take way longer, same if you use more chunks.

merely used 100 chunks wtih random words. sometimes that seems to result in bad output, but it is rather infrequent and it does not appear to bias output like normal datasets like wikitext.
that said, I guess they can be even better if one uses the specific fine tuning datasets of each model, so the bias is "useful".

@KnutJaegersberg

Thanks for your reply! As an AI researcher and consultant I am diving into the imatrix concepts. Will do some advanced research and experimentation soon. I am pretty sure we can increase the quality of these new quant formats even more by using slightly better algorithms and best practices for generating the importance matrix. This will however require a lot of testing and experiments but is very worth it! Thanks a lot for your efforts so far in uploading and making these new quants available to the masses.

Yesterday I heard there is already a pull request for 1.5 bit quants in llama.cpp (thoug in reality it is more like 1.8). I hope the community can make these work at least as well as 4 bit quants, so that using advanced LLMs is possible on consumer hardware. I think it is important for democracy to counteract the centralization efforts around this technology and source of power.
If this power is restrained in access for the few, it will consume democracy from within the insitutions. I think it is more sustainable for democracy to prevail a little more chaos instead, though this seems to become an unpopular opinion and there is no real political support for this. We live in times of cowardry.

Sign up or log in to comment