Problem with streaming support

#17

by mattma1970 - opened Nov 10, 2023

mattma1970

Nov 10, 2023

I'm serving OpenOrca using HF TGI with stream=True. The problem is that the stopping sequence <|im_end|> consists of 10 tokens . It that string is split across two response chunks then it doesn't get automatically removed from the text.

I know this is a very specific instance but wondering if anyone else has encountered this and managed to solve it?

jlzhou

Nov 10, 2023

You need to upgrade the transformers version, mistral support was introduced in 4.34.0, TGI 1.1.0 depends on transformers 4.33.3. After upgrading transformers my TGI can stop without generating '<|im_end|>'.

We build a docker image if you want to use, zjuici/mirror.huggingface.text-generation-inference:1.1.0-transformers-4.34.1

mattma1970

Nov 13, 2023

Thanks for helping. I haven't tried yet. I am running TGI for the official docker image so I'm try yours instead. Cheers
Matt

mattma1970 changed discussion status to closed Nov 13, 2023

mattma1970 changed discussion status to open Nov 13, 2023

mattma1970

Nov 13, 2023

I'm curious about where the change in the transformer version is set in the image? (docker novice).

jlzhou

Nov 13, 2023

I could not find the Dockerfile right now but it should be as simple as (IIRC):

FROM ghcr.io/huggingface/text-generation-inference:1.1.0

RUN python -m pip install transformers==4.34.1

mattma1970

Nov 13, 2023

Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment