Response time

#42
by Majidni - opened

Hi, I am using Mistral-7B-Instruct-v0.2 and I am running it on Nvidia 4090. sometimes It responds in a few seconds and sometimes takes up to 2 minutes. Do you have any idea about it?

Use torch_dtype=torch.bfloat16 while loading the model; it will make your output faster.

Sign up or log in to comment