Text Generation
Transformers
English
llama

repeats the same word in the output

#1
by eraldohug - opened

Running the model on oobabooga/text-generation-webui
using q4_K_M and q5. the results are always constant repetitions after just three or four words. Even in chat or notebook .
Example:
text context input :
In photosynthetic bacteria, the proteins that gather light for photosynthesis are embedded in cell membranes. In its simplest form, this involves the membrane surrounding the cell itself.[19]
However, the membrane may be tightly folded into cylindrical sheets called thylakoids,[20] or bunched up into round vesicles called intracytoplasmic membranes.[21]
model output :
In photosynthetic bacteria, the proteins that are involved in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in

you have to use a starting parameters --rope-freq-scale 0.25 and -c 16384

-c 16384 as it is llama2 has 4096 as default -c 4096
--rope-freq-scale 0.25 as it has x4 context

Yeah this repeating word thing is when rope frequency scale is set wrong. I mentioned this in the README under the llama.cpp command, but I guess it needs to be clearer.

I haven't checked text-generation-webui's llama.cpp loader recently - presumably it has parameters for that

But you set -c to the desired context I believe? So -c 16384 to make use of the full 16K

ups ... you are right ... corrected ;D

Works now.
in oobabooga/text-generation-webui , the parameter compress_pos_emb must be set to 8.

Sign up or log in to comment