latency is slower than standard model, is it normal?

#5
by LorenzoCevolaniAXA - opened

I have been testing this model and comparing it against the standard mixtral original model.
When it comes down to the latency this model is much slower than the original one, around a factor 2.
Is it normal? the reduction in size is super nice but the latency is now not optimal for like a chatbot

Sign up or log in to comment