Note about the `--rope-freq-scale` argument.

#1
by musicallyut - opened

Though the README.md explicitly states:

Change -c 4096 to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.

However, in my experience, I had to still explicitly add --rope-freq-scale 0.125 to get sensible output, even after passing -c 32768 as an argument. Without that, I was only getting spaces and new-lines in the output (probably token 0?).

Perhaps this deserves an explicit call-out in the README.md?

so this works? cause I still get blank output with colons and dashes even with setting --rope-freq-scale 0.125. How are you passing the instruction?

Like so:

./main -t 24 -m ./llama-2-7b-32k-instruct.Q8_0.gguf.1 --color -c 32768 --temp 0 --repeat_penalty 1 -n 512 -p "[INST] Describe in detail the timeline of the moon landing. [/INST]" --rope-freq-scale 0.125

The output is legible.

ok ill try and update

@BlahBlah1 that worked ?

Sign up or log in to comment