how to use 128k context?
How to use 128k context using llamacpp?
I would also like to know, as the example in the readme only has 4096 set as the ctx length.
llama.cpp currently does not support the yarn rope scaler.
llama.cpp currently does not support the yarn rope scaler.
So how exactly can we use this models 128k context?
He should probably put a note at the top of the readme saying that there is currently no way to use this
*can't use it at it's current context limit. I got it working with smaller context (defeats the point) but the responses where really bad. But, I'm a noob so I could have been prompting the model incorrectly. Still props for this model being one of the first OS LLMs out the gate with the over 100K context window (at least potentially) and I look forward to seeing more refinements here.
excerpts from their paper
looks like for coding tasks, the code llama models perform very well for long contexts as is