TheBloke/wizard-vicuna-13B-GPTQ · Increasing the context window

Great, glad to hear it.

I'm afraid that context window is baked into the model and cannot be increased. This applies to nearly all models available at the moment, including the major ones. For example, ChatGPT 3.5 has a limit of 4096 tokens, and ChatGPT 4 has two versions, one with 8k and one with 32k (not many people have access to 32k yet though).

There are some new models coming out that have much longer context lengths or methods to increase context length, like MPT which can be increased up to 65K (though I believe it then has massive VRAM requirements.) But generally speaking, existing models have a pre-defined context length which can't be increased. LLaMA released with a 2k context limit, and all models based on it therefore inherit that.

For existing models there are some techniques that can sometimes help. For example LangChain has a summarisation feature whereby in a chat situation where you're asking follow up questions it can automatically summarise past interactions to get the most out of your limited context window.

But other than that there's not much you can do right now I believe.