TheBloke/Llama-2-7B-32K-Instruct-GGUF · Note about the `--rope-freq-scale` argument.

Sep 23, 2023

Though the README.md explicitly states:

Change -c 4096 to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.

However, in my experience, I had to still explicitly add --rope-freq-scale 0.125 to get sensible output, even after passing -c 32768 as an argument. Without that, I was only getting spaces and new-lines in the output (probably token 0?).

Perhaps this deserves an explicit call-out in the README.md?

BlahBlah1

Sep 29, 2023

so this works? cause I still get blank output with colons and dashes even with setting --rope-freq-scale 0.125. How are you passing the instruction?

musicallyut

Oct 1, 2023

Like so:

./main -t 24 -m ./llama-2-7b-32k-instruct.Q8_0.gguf.1 --color -c 32768 --temp 0 --repeat_penalty 1 -n 512 -p "[INST] Describe in detail the timeline of the moon landing. [/INST]" --rope-freq-scale 0.125

The output is legible.

BlahBlah1

Oct 5, 2023

ok ill try and update

tocof44188alibrscoM

Feb 19, 2024

@BlahBlah1 that worked ?

BlahBlah1

Mar 28, 2024

nope