Not able to have an output with a smaller size than the given `max_length`

by Loheek - opened Apr 25, 2024

Apr 25, 2024

•

edited Apr 25, 2024

Hello, when using the 3B model, I am not able to have an output with a smaller size than the given max_length.
It always give a correct answer on first tokens, and then output garbage to fill until max_length is reached.

I use the code in the generate_openelm.py as template.
I tried to adapt the different generation options as described here, like length penalty or changing the generation strategy, but without success.

Is it possible ? Any advice would be very welcome

reginaldlu

Apr 26, 2024

It seems that apple does not release its chat template, so currently it only worked in text-generation (by using llama2's tokenizer)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment