Generate a little bit of content at a time
#26
by
loong
- opened
You need to change the max_length.
For example:
Set model as model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", quantization_config=quantization_config, max_length = 200)
Indeed, make sure to pass a large enough max_new_tokens
in generate()
osanseviero
changed discussion status to
closed