Note about the `--rope-freq-scale` argument.
Though the README.md
explicitly states:
Change
-c 4096
to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
However, in my experience, I had to still explicitly add --rope-freq-scale 0.125
to get sensible output, even after passing -c 32768
as an argument. Without that, I was only getting spaces and new-lines in the output (probably token 0?).
Perhaps this deserves an explicit call-out in the README.md
?
so this works? cause I still get blank output with colons and dashes even with setting --rope-freq-scale 0.125. How are you passing the instruction?
Like so:
./main -t 24 -m ./llama-2-7b-32k-instruct.Q8_0.gguf.1 --color -c 32768 --temp 0 --repeat_penalty 1 -n 512 -p "[INST] Describe in detail the timeline of the moon landing. [/INST]" --rope-freq-scale 0.125
The output is legible.
ok ill try and update
@BlahBlah1 that worked ?
nope