GGUF parameters suggestion

#1
by lazyDataScientist - opened

This is my recommended setting using the Q4 gguf file.
Parameters

temperature=0.67, 
top_p=1, 
top_k=0, 
repetition_penalty = 1.5

Parameters related to the model when loaded into memory

compress_pos_emb = 8 #(also know as Linear Rope Scaling; I believe)
rope_freq_base = 45,0000
n_ctx = 32768

With these settings I get a very coherent story for 1000-2000 token. After the 2000-2500 token mark it will start to gradually make stuff up but keeps a solid story structure.

Hope this helps!!

I use temperature=0.7, top_p=0.8, top_k=90, repetition_penalty = 1.16, repeition_penalty_range = 4096, though it probably works for a broad range of values. ~2000 tokens is about the max you can get while following a given prompt/outline out of any model these days, I think. But Aurelian is trained for multi-round, so you can keep going for ever pretty much. Just need to prompt and re-direct every 2k tokens or so.

Here is an example for a one-shot story. Most of the model's training is actually not for a one-shot story, but for scene-by-scene writing, so you can continue the story by following up with the next scene, and it will maintain consistency for the entire 32K context window.

Sign up or log in to comment