How to enable streaming for phi 3 vision model ?

#15
by bhimrazy - opened

I have developed an interface to chat with this model and was exploring how to stream the output.
https://lightning.ai/bhimrajyadav/studios/deploy-and-chat-with-phi-3-vision-128k-instruct

But I couldn't get it right.

What have you tried?

Thanks @dranger003 for the script.

I used the existing TextIterabeStreamer and got it working.


#streaming
from threading import Thread
from transformers import TextIteratorStreamer
streamer = TextIteratorStreamer(processor.tokenizer,skip_prompt=True,skip_special_tokens=True,clean_up_tokenization_spaces=False)

# Run the generation in a separate thread, so that we can fetch the generated text in a non-blocking way.
generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=512, eos_token_id=processor.tokenizer.eos_token_id)
thread = Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

for text in streamer:
    print(text, end="", flush=True)

@sebbyjp , I was getting errors due to some parameter misconfiguration. Finally, it works now.

Awesome! Are you able to run batched inference with image inputs?

@bhimrazy Thanks, I didn't know about TextIteratorStreamer!

Awesome! Are you able to run batched inference with image inputs?

Thank you for the feedback! I haven't had the chance to check out batched inference with image inputs yet, but I'll definitely look into it. I appreciate you bringing it to my attention.

By the way, I have a studio deployed that you can try out. Feel free to explore it here: Deploy and Chat with PHI 3 Vision 128K Instruct.

nguyenbh changed discussion status to closed

Sign up or log in to comment