TheBloke/Pygmalion-2-13B-GGUF

Sep 8, 2023

Greetings,

i got this model set up and running without issues, according to the blogpost, except for the context length: the blogpost states a length of 4k tokens, but every time i go beyond 2k tokens (while 4k are correctly set) with this GGUF version, i get the context window error from llama.cpp. (text-generation-webui & SillyTavern)

I wasn't able to test out other versions yet.

Does anyone happen to know if this is a limitation with this GGUF version or with llama in general or may the error be somewhere else?

Thanks in advance.

TheBloke

Owner Sep 8, 2023

Is it an error, or just a warning? Does it say "warning: model might not support context sizes greater than 2048 tokens .. expect poor results" ? If so you can safely ignore that. I'm not sure why it's still in the code, but it doesn't apply to Llama 2 models or any model which has extended context

markenzwieback

Sep 8, 2023

•

edited Sep 8, 2023

Unfortunately, it's the error and therefore it's not generating any output. It happens using SillyTavern+tgw and tgw alone.

In both instances i did verify that the context length was set to 4k and llama.cpp was used for loading the model.

I get the feeling that it's an error with tgw on my end 🤔

Voyajer

Sep 9, 2023

After setting context length in tgw, remember to reload the model. I was getting "llama_tokenize_with_model: too many tokens" in the terminal until I did that.