Generate a little bit of content at a time

by loong - opened

Generate a little bit of content at a time


You need to change the max_length.

For example:
Set model as model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", quantization_config=quantization_config, max_length = 200)

Indeed, make sure to pass a large enough max_new_tokens in generate()

osanseviero changed discussion status to closed

Sign up or log in to comment