I want to make the chatbot using Mixtral-8x7B-Instruct-v0.1 model, but the inference speed is very slow, so I cannot use it as a chatbot. How can I fix this issue?

#211

by rising620 - opened May 17

May 17

I have already done many things such as flash attention, model quantization to optimize the inference speed, but still got slow influence speed as a chatbot.
Please help me how can I speed up this model.

Gopi225

Jul 9

Hi, did you resolve the issue. I am also facing same issue can you help me please

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment