About Input validation error: `inputs` tokens + `max_new_tokens` must be <= 1512.

#47
by Holynull - opened

I always got an following error from AWS Sagemaker Inference Endpoint. When I use langchain create a qa chain, its prompt template has more then 1000 tokens in general. How to solve this issue?

{"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 1512. Given: 1682 `inputs` tokens and 200 `max_new_tokens`","error_type":"validation"}

I also encountered the same problem. Did you find the solution?

Having the same issue with meta-llama/Llama-2-13b-chat-hf. Where is this 1512 limitation coming from?

Support explained: You can configure those when creating your endpoint. On inference endpoints in the UI when creating, in the advanced configuration section there are two fields: "max input length" and "max number of tokens". The latter is where the 1512 limitation is coming from. Change to your needs. I find it tricky to figure out which values are ok for the model and the platform, so far it's been try&error.

Sign up or log in to comment