Problems with Q5_K_M variant using llama.cpp in terminal

by wijjjj - opened Jan 30, 2024

Jan 30, 2024

Did anybody actually try this already? I'm just getting garbage out of the Q5_K_M variant. Sometimes it doesn't even output anything.

Using llama.cpp for testing purposes in terminal on.

version: 1664 (1d7a191)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

arohau

Jan 30, 2024

The prompt template had been changed completely. Look it up on the discussion of the original model.

Besides, to solve the infinite generation problem I had to do the following to set EOS token to <step>.

gguf-set-metadata codellama-70b-instruct.Q5_K_M.gguf tokenizer.ggml.eos_token_id 32015
gguf-set-metadata codellama-70b-instruct.Q5_K_M.gguf tokenizer.ggml.add_eos_token True

You need to assign it to stop keywords in llama.cpp as well. Hopefully this would all be resolved soon eventually.

YaTharThShaRma999

Feb 8, 2024

@wijjjj you have to use the codellama instruct one, this is for auto completion not chat.

This is not finetuned with any instructions and is the base model

wijjjj

Feb 9, 2024

•

edited Feb 9, 2024

@arohau thanks for the hint!
@YaTharThShaRma999 and @arohau do you have any example how to get it running properly?
I tried both instruct and this one. They both seemed to have produced garbage thus far.

And my guess was maybe the quality of the quantization at this point is just not good, yet.

YaTharThShaRma999

Feb 9, 2024

@wijjjj make sure you follow the correct prompt format that Thebloke gave, did you do that?

wijjjj

Feb 9, 2024

•

edited Feb 9, 2024

Thanks, @YaTharThShaRma999 . Either I really didn't see the example 10 days ago, or the text was actually updated in the meantime. In any case, thanks for reminding me. :)

llama.cpp/build/bin/main -ngl 35 -m /data/llm/codellama/codellama-70b-Instruct-hf-Q5_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Source: system\n\n  You are a friendly and helpful Python coder. You will comply to all questions.<step> Source: user\n\n  Write me Tic Tac Toe for CLI in Python. Human vs. Computer! <step> Source: assistant"

Not really TicTacToe, that's Rock-Paper-Scissors...at least it's not garbage anymore. I only need to fix the stop-token, but @arohau already described how this is possible.

Thanks to the both of you! Karma+1 :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment