Limit the number of generated tokens

#26
by sabrieyuboglu - opened

How can we limit the number of generated tokens in the call to generate?

Something like:

generate_text = pipeline(
    model="databricks/dolly-v2-12b",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
generate_text("Hello world!", max_length=5)

Also, would be helpful to set temperature.

Thanks!

Databricks org

This is all a function of Hugging Face, and you can use the standard options here. max_new_tokens controls the generated tokens, and you can pass temperature= here.

Right, where in the huggingface docs do they specify the options we can pass?

Here the tutorial video for how to install and use on Windows including your question. Unfortunately documentation was poor so I had to do a lot of research.

The video includes a Gradio user interface script and teaches you how to enable load 8bit speed up and lower VRAM quantization

Dolly 2.0 : Free ChatGPT-like Model for Commercial Use - How To Install And Use Locally On Your PC
image

Databricks org

@MonsterMMORPG you're posting this in a whole lot of places. Maybe focus this where you think it clearly answers the question and summarize the answer, rather than post a link to your video. For example, I'm not clear that your video addresses this question.

@MonsterMMORPG you're posting this in a whole lot of places. Maybe focus this where you think it clearly answers the question and summarize the answer, rather than post a link to your video. For example, I'm not clear that your video addresses this question.

yes in video i have shown max_length. the video covers it

matthayes changed discussion status to closed

Sign up or log in to comment