Model answer ends in repeating word

#1
by mrichardt - opened

Eg in LM Studio (0.1.11)

curl http://localhost:1234/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"messages": [{"role": "user", "content": "Introduce yourself."}],
"temperature": 0.7,
"max_tokens": -1,
"stream": false
}'

Get following response

[2023-08-04 18:33:17.287] [INFO] Generated prediction: {
"id": "chatcmpl-anobk33c2ezhuggk932",
"object": "chat.completion",
"created": 1691166667,
"model": "/Users/martinrichardt/.cache/lm-studio/models/TheBloke/vicuna-13B-v1.5-16K-GGML/vicuna-13b-v1.5-16k.ggmlv3.q4_1.bin",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "\nMy name is Anastasiya. I am a student of master's program in the field of marketing at the University of Economics in Varna, Bulgaria. I have always been interested in the world of business and how it can affect the economy. That's why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why why ...

Also tried llama.cpp with similar results.
Does anyone have a solution for that?

This happens when the RoPE settings aren't correct

In llama.cpp try:
-c 16384 --rope-freq-base 10000 --rope-freq-scale 0.25 for 16K context, or:
-c 8192 --rope-freq-base 10000 --rope-freq-scale 0.5 for 8K context.

Don't know how this is applied in LM Studio - might be there's no option for it yet. Check settings for anything mentioning rope frequency base and rope frequency scale

Thank you, that worked out great!

I've been setting it 4 or 8 for 16k and 32k, thank you so much for this!

Saving this to the notes by commenting.

Sign up or log in to comment