Text Generation
Transformers
GGUF
English
llama
text generation
instruct
text-generation-inference

context length

#1
by markenzwieback - opened

Greetings,

i got this model set up and running without issues, according to the blogpost, except for the context length: the blogpost states a length of 4k tokens, but every time i go beyond 2k tokens (while 4k are correctly set) with this GGUF version, i get the context window error from llama.cpp. (text-generation-webui & SillyTavern)

I wasn't able to test out other versions yet.

Does anyone happen to know if this is a limitation with this GGUF version or with llama in general or may the error be somewhere else?

Thanks in advance.

Is it an error, or just a warning? Does it say "warning: model might not support context sizes greater than 2048 tokens .. expect poor results" ? If so you can safely ignore that. I'm not sure why it's still in the code, but it doesn't apply to Llama 2 models or any model which has extended context

Unfortunately, it's the error and therefore it's not generating any output. It happens using SillyTavern+tgw and tgw alone.

In both instances i did verify that the context length was set to 4k and llama.cpp was used for loading the model.

I get the feeling that it's an error with tgw on my end πŸ€”

After setting context length in tgw, remember to reload the model. I was getting "llama_tokenize_with_model: too many tokens" in the terminal until I did that.

Sign up or log in to comment