Problems with Q5_K_M variant using llama.cpp in terminal

#1
by wijjjj - opened

Did anybody actually try this already? I'm just getting garbage out of the Q5_K_M variant. Sometimes it doesn't even output anything.

Using llama.cpp for testing purposes in terminal on.

version: 1664 (1d7a191)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

The prompt template had been changed completely. Look it up on the discussion of the original model.

Besides, to solve the infinite generation problem I had to do the following to set EOS token to <step>.

gguf-set-metadata codellama-70b-instruct.Q5_K_M.gguf tokenizer.ggml.eos_token_id 32015
gguf-set-metadata codellama-70b-instruct.Q5_K_M.gguf tokenizer.ggml.add_eos_token True

You need to assign it to stop keywords in llama.cpp as well. Hopefully this would all be resolved soon eventually.

@wijjjj you have to use the codellama instruct one, this is for auto completion not chat.

This is not finetuned with any instructions and is the base model

@arohau thanks for the hint!
@YaTharThShaRma999 and @arohau do you have any example how to get it running properly?
I tried both instruct and this one. They both seemed to have produced garbage thus far.

And my guess was maybe the quality of the quantization at this point is just not good, yet.

@wijjjj make sure you follow the correct prompt format that Thebloke gave, did you do that?

Thanks, @YaTharThShaRma999 . Either I really didn't see the example 10 days ago, or the text was actually updated in the meantime. In any case, thanks for reminding me. :)

llama.cpp/build/bin/main -ngl 35 -m /data/llm/codellama/codellama-70b-Instruct-hf-Q5_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Source: system\n\n  You are a friendly and helpful Python coder. You will comply to all questions.<step> Source: user\n\n  Write me Tic Tac Toe for CLI in Python. Human vs. Computer! <step> Source: assistant"

image.png
Not really TicTacToe, that's Rock-Paper-Scissors...at least it's not garbage anymore. I only need to fix the stop-token, but @arohau already described how this is possible.

Thanks to the both of you! Karma+1 :)

Sign up or log in to comment