Fantastic, thank you very much
I'm gonna try it tonight, to take the memory lane ! :D
We will all be disappointed. So disapppinted.
Actually, other than the ridiculously small 8k context, it follows orders better than most newer stuff (e.g. magnum). Maybe I should resurrect some other corpses on my disk (venus, lzlv :)
Yeah, I remember fondly Airoboros 1.4.1 33b Lxctx, 16k context in the summer 2003, and quite context obedient and coherent..
I kept it on the side. This 65b is on par, from what I can see, its PI method being less efficient to have a low perplexity (optimal perplexity is with linear rope 0.35-0.4, instead of 0.25. Some more context loss, but just to test the quality of the model at 5-6k context. You can add a rope base frequency of 18000 on the top of it to reach back 8-9k.