Please reconvert to new GGML format

#6
by Delta36652 - opened

llama.cpp now includes GPU offloading support, but it requires for model file to be represented in new GGML file format.

Updating today

Can't wait!

Second this. Please convert to GGML3 with the new K Quants.

I tried to do k-quants for this model myself the other day because I was asked to, but it's not currently possible.

There's currently an issue that prevents making k-quants with certain models, models which feature tensors that aren't divisible by 256.

That affects two types of Llama models:

  • Ones that had a vocab size of 32001 instead of 32000 (because of the addition of a PAD token - which I think was an early hack which got copied even where it's not needed)
  • Models based on OpenAssistant which have a vocab of 32016 tokens.

This model is an example of the latter, so it won't be possible to make k-quants until this is resolved: https://github.com/ggerganov/llama.cpp/issues/1919#issuecomment-1599484900

Sign up or log in to comment