200k -> 4k

#2
by ssaroya - opened

Hi, quick question, I was wondering if the 4k context restriction is something that can be easily removed?

What do you mean, this model support at up to 200K context length in principle. You can hardly fit 200K context as it would cost around 40GB vram or so I believe. So most model loader like vllm, you need to specify a -max_model_len as 8192 for this model, which is large enough for most tasks

image.png
So this here

image.png
So this here

I believe this is just a typo, should be 200K @TheBloke

Sign up or log in to comment