Max length 2048 error

#5
by abhatia2 - opened

Hey,
I am getting this error for large inputs
"{"error":"Input validation error: inputstokens +max_new_tokensmust be <= 2048. Given: 2037inputstokens and 400max_new_tokens","error_type":"validation"}"

Llama2 models have 4096 context length, is this something that can be configured during deployment?

By the way, I am getting this error for other quantized models too like TheBloke/Llama-2-70B-Chat-GPTQ

I was to resolve this by setting MAX_TOTAL_TOKENS parameters as mentioned in docs: https://huggingface.co/docs/text-generation-inference/basic_tutorials/launcher#maxtotaltokens

Sign up or log in to comment