lmsys/vicuna-13b-v1.5-16k · Update max_position

Oct 17, 2023

I'm running this model on a vLLM server and I am receiving the following error: <Server response: {'code': None, 'message': "This model's maximum context length is 4096 tokens. However, you requested 4348 tokens (4332 in the messages, 16 in the completion). Please reduce the length of the messages or completion.", 'object': 'error', 'param': None, 'type': 'invalid_request_error'}>

Since this is the 16k version of Vicuna-13b-v1.5, the maximum context length should be 16384. The max_sequence_length in config.json is the only place left where there is any mention of 4096. I'm making the assumption that the parameter is what is causing the server error.

Update max_position_embeddings to 1638436fba3ba

weichiang

Oct 23, 2023

•

edited Oct 23, 2023

the rope scaling factor is already specified here. I think vLLM should probably use this to decide the context length?
https://huggingface.co/lmsys/vicuna-13b-v1.5-16k/blob/17c61f9ca19f5a7a04e96b2cc0d9bcf2920cb8c2/config.json#L22

weichiang changed pull request status to closed Oct 23, 2023