Input validation error: `max_new_tokens` must be <= 1. Given: 20

#12
by reubenlee3 - opened

I'm using the inference endpoint to quickly get one model up and running to test it. Below is my container configuration, but I can't seem to make heads or tails of the error: Input validation error: max_new_tokens must be <= 1. Given: 20. I'm using the GPU medium (Nvidia A10G) as my instance type.

image.png

Any thoughts?

Hi @reubenlee3 , I think you have to set the "Max Number of Tokens (per Query)" to Max Input Length (per Query) + max_new_tokens -- let me know if that solves the issue!

Sign up or log in to comment