Problem with streaming support
I'm serving OpenOrca using HF TGI with stream=True. The problem is that the stopping sequence <|im_end|> consists of 10 tokens . It that string is split across two response chunks then it doesn't get automatically removed from the text.
I know this is a very specific instance but wondering if anyone else has encountered this and managed to solve it?
You need to upgrade the transformers version, mistral support was introduced in 4.34.0, TGI 1.1.0 depends on transformers 4.33.3. After upgrading transformers my TGI can stop without generating '<|im_end|>'.
We build a docker image if you want to use, zjuici/mirror.huggingface.text-generation-inference:1.1.0-transformers-4.34.1
Thanks for helping. I haven't tried yet. I am running TGI for the official docker image so I'm try yours instead. Cheers
Matt
I'm curious about where the change in the transformer version is set in the image? (docker novice).
I could not find the Dockerfile right now but it should be as simple as (IIRC):
FROM ghcr.io/huggingface/text-generation-inference:1.1.0
RUN python -m pip install transformers==4.34.1
Thanks!