Incomplete Output even with max_new_tokens

#37
by Pradeep1995 - opened

So the output of my finetuned openchat model ends abruptly and I ideally want it to complete the paragraph/sentences/code which it was it between of.
Although I have provided max_new_tokens = 300 and also in prompt I give to limit by 300 words.

The response is always big and ends abruptly. Any way I can ask for a complete output within desired number of output tokens?

The generation suddenly stops as soon as it reaches the specified number of max_new_tokens reached, without checking whether the sentence is completed or not.

Sign up or log in to comment