slow response

#138
by bhavanam2809 - opened

hi, i am using Mixtral-8x7B-Instruct-v0.1 . added the following parameters (torch_dtype=torch.float16,device_map="auto",trust_remote_code=True)
getting very slow response, response time is more than 30mins .
any solution regarding this?

I am having the same issue. I wonder what the setup is in the Inference API on this website. It seems very fast in comparison.

Sign up or log in to comment