How to use this?

#1
by steampunk333 - opened

I'm familiar with gguf by itself, but how do I use this imatrix with the assembled file? Do I even have to? What does it do?

You just load it with a newer l.cpp

Don't think Q3 will fit much context though, sadly. At least not without flash attention like exl2.

I was more referring to Q_8(running in DRAM)

Also, what's l.cpp?

llama.cpp

Sign up or log in to comment