This model is mind-

#2
by Novelo - opened

I wonder what settings everyone is using, I have tried a few, mostly lowering repetition, playing with min_p and the like.

image.png

With the second version of this model I always used the Shortwave preset in oobabooga/st and it worked very well for me. I would recommend trying with this one too.

v3 has this weird thing where the longer the context gets, the higher I want to push min P. I would start ~0.07-0.08 for 8k context. ~0.10 for 8k-16k. ~0.12-15 for 16k+. It seems to work better than repetition penalty because anything in the context gets more likely to be repeated. Changing min P as the buffer gets bigger can knock it away from obsession with the context. While repetition penalty should do that, I found this led to it getting too unhinged. Similarly, leaving min P at 0.15 on a new chat leads to it acting very stupid up front. MAYBE creatively stupid, but not always: Mostly just regular stupid.

At the same time, dynamic temp seems like it needs to scale down with longer context. So, I would start 0.5 - 2.0 with empty buffer and let it roll. 8k+, I might bump the max down to 1.8. Approaching 16k, I would push it back down to 1.5.

I found the "tolerance" for this model was a lot more similar to Mixtral in that small changes in parameters led to big, unexpected changes in behavior. But, I did feel like I had to constantly tweak it.

I have not tried it since "smoothing" became available in ST.

Sign up or log in to comment