use of <bot_end>
Hi,
Excellent work on function calling. However, how can I use to save on inference speed and tokens?
result = pipeline(prompt, max_new_tokens=2048, stop = "", return_full_text=False, do_sample=False, temperature=0.001)[0]["generated_text"]
print (result)
Pipeline is:
pipeline = pipeline(
"text-generation",
model="Nexusflow/NexusRaven-V2-13B",
torch_dtype="auto",
device_map="auto",
)
Error:
ValueError: The following model_kwargs are not used by the model: ['stop'] (note: typos in the generate arguments will also show up in this list)
Hi @nzaveri !
Thank you for your interest in the model! There's a couple ways you can implement this. The easiest is to just use TGI, as it accepts a stopping criteria as one of the arguments in the payload. You might be able to spin this up and just sent REST-like requests to the endpoint with a stopping criteria in the parameter dict in your payload. For text generation pipeline, I don't believe there's an easy implementation for stopping criteria. You'll likely have to implement a StoppingCriteriaList that gets a StoppingCriteria passed in (where you'll specify "<bot_end>" in its tokenized form). Something like this: https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b/commit/072102d1d3462d9b2e18d91f4d22e894d83e7ccf