Context length is not 128k

#41
by pseudotensor - opened

vllm uses default of 8k, and can't make it use 128k.

https://github.com/vllm-project/vllm/issues/3676

you can .. just change the config.json
but 128k would take over 130g vram alone .. i can only fit 64 in 96g

As I argue in that vLLM thread. I don't think that's how it should be done. Shouldn't just change embedding size, since rope scaling is used. It should be part of the calculation.

Sign up or log in to comment