repeats the same word in the output
Running the model on oobabooga/text-generation-webui
using q4_K_M and q5. the results are always constant repetitions after just three or four words. Even in chat or notebook .
Example:
text context input :
In photosynthetic bacteria, the proteins that gather light for photosynthesis are embedded in cell membranes. In its simplest form, this involves the membrane surrounding the cell itself.[19]
However, the membrane may be tightly folded into cylindrical sheets called thylakoids,[20] or bunched up into round vesicles called intracytoplasmic membranes.[21]
model output :
In photosynthetic bacteria, the proteins that are involved in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in
you have to use a starting parameters --rope-freq-scale 0.25 and -c 16384
-c 16384 as it is llama2 has 4096 as default -c 4096
--rope-freq-scale 0.25 as it has x4 context
Yeah this repeating word thing is when rope frequency scale is set wrong. I mentioned this in the README under the llama.cpp command, but I guess it needs to be clearer.
I haven't checked text-generation-webui's llama.cpp loader recently - presumably it has parameters for that
But you set -c to the desired context I believe? So -c 16384 to make use of the full 16K
ups ... you are right ... corrected ;D
Works now.
in oobabooga/text-generation-webui , the parameter compress_pos_emb must be set to 8.