4096 context?

#3
by david565 - opened

Meta reports that the base models support 4096 context. Is it possible to make GGML models with 4096 context?

llama.cpp:
$ ./main -c 4096 -m /media/data/llama-2-13b.ggmlv3.q6_K.bin
main: warning: base model only supports context sizes no greater than 2048 tokens (4096 specified)

Getting the same error.

Looking at this commit: https://huggingface.co/meta-llama/Llama-2-13b-hf/commit/f3b475aaed299d2389525d6ce4e542cc438833a4

"max_position_embeddings": 2048,

3 days ago this was changed to:

"max_position_embeddings": 4096

Edit: oops that's the hf model, so I guess I'm not sure.

Yeah I need to fix those config.json files and will do it now

But it won't change that warning message, which is currently hardcoded into llama.cpp and can be ignored on models you know have >2048 context:

image.png

So to be clear, yes my config.json files are wrong and will be updated, but that in no way affects the GGML models which work fine at 4096 context - or even greater using RoPE scaling. And to be honest it doesn't really affect the GPTQ models either, as the value in config.json is just a default/baseline and most clients let you specify the context value independently.

Sign up or log in to comment