extremely variable response time for inference?

#21

by silvacarl - opened Dec 5, 2023

Dec 5, 2023

This model is awesome. But for some reason we are getting extremely variable response time for inference, anywhere between 0.40 seconds and 15 sewconds on an A40.

Could this be cauae by the prompt format or otehr inference parameters?

banghua

Berkeley-Nest org Dec 5, 2023

That's interesting.. Actually I never experienced this before. What kind of inference package are you using? TGI, vLLM or other stuff?

silvacarl

Dec 5, 2023

just something we have had to test other models for quite a while. we are checking to see if somehow it is doing something weird. just thought it would be good to post here to check.

BUT: I CANTELL YOU SO FAR ITS INSANELY AWESOME. 8-)

Like crazy accurate.

banghua

Berkeley-Nest org Dec 5, 2023

Haha thank you! I'm glad you like it! It's also likely due to mistral structure itself? Not sure if mistral base / instruct will have the same issue.

silvacarl

Dec 5, 2023

yes, it could be that as well. we will check that, running additional tests now.

silvacarl

Dec 8, 2023

•

edited Dec 9, 2023

its kind of interesting. it flies then for some reason it will sit on one inference for about 15 seconds. then it flies again.

just in case, what is the prompt format? Can you post an example?

banghua

Berkeley-Nest org Dec 9, 2023

The prompt format is listed in the model card. FYI is

But I don't think prompt format will change inference speed. Is it only happening for starling but not other mistral-based model? That is very mysteriours..

silvacarl

Dec 9, 2023

yeah tracing the code to see what's up, its reallt weird.

beenotung

Dec 9, 2023

Is it slow for the same prompt consistently?

Sujan42024

Dec 9, 2023

The model is just too slow actually. I have been testing this model on an AWS instance with NVIDIA Tesla T4 GPU and it takes 2-3 minutes for each response. Once, it took about 9 minutes to generate a simple response. IDK what is going on and my internet is good too.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment