Not able to have an output with a smaller size than the given `max_length`

#3
by Loheek - opened

Hello, when using the 3B model, I am not able to have an output with a smaller size than the given max_length.
It always give a correct answer on first tokens, and then output garbage to fill until max_length is reached.

I use the code in the generate_openelm.py as template.
I tried to adapt the different generation options as described here, like length penalty or changing the generation strategy, but without success.

Is it possible ? Any advice would be very welcome

It seems that apple does not release its chat template, so currently it only worked in text-generation (by using llama2's tokenizer)

Sign up or log in to comment