Excellent work on function calling. However, how can I use to save on inference speed and tokens?

result = pipeline(prompt, max_new_tokens=2048, stop = "", return_full_text=False, do_sample=False, temperature=0.001)[0]["generated_text"]
print (result)

Pipeline is:
pipeline = pipeline(

ValueError: The following model_kwargs are not used by the model: ['stop'] (note: typos in the generate arguments will also show up in this list)

Hi @nzaveri !

Thank you for your interest in the model! There's a couple ways you can implement this. The easiest is to just use TGI, as it accepts a stopping criteria as one of the arguments in the payload. You might be able to spin this up and just sent REST-like requests to the endpoint with a stopping criteria in the parameter dict in your payload. For text generation pipeline, I don't believe there's an easy implementation for stopping criteria. You'll likely have to implement a StoppingCriteriaList that gets a StoppingCriteria passed in (where you'll specify "<bot_end>" in its tokenized form). Something like this:

