Very long response time
Hello,
After downloading and loading the model by :
text2text_generator = pipeline(
"text-generation",
model = "mistralai/Mistral-7B-v0.1",
max_length = 20)
It takes 15 minutes to generate : [{'generated_text': 'where is the capitale of germany?\n\nBerlin is the capitale of germany'}]
by text2text_generator("where is the capitale of germany?")
am I doing something wrong ? is there a way to reduce this response time ?
My setup is : MacBook Pro 2018, cpu : 2,9 Ghz intel core i9, Memory : 32 DDR4, Graphic : intel UHD Graphics 630 1536 MB
Thank you for your response,
If I want to deploy this model as part of a application on a server, what kind of setup should I ask for the server?
I wont say i'm 'the' expert, but you need to look into NVIDIA GPU.. running this stuff on CPU is going to be painful at best. I run this sort of stuff on an old 12G Titan and its still a world of difference between that and
even a decent CPU. You can get far better than i have these days for not much budget.
Oh and for local dev work, might consider a GGUF format. will run faster and be good enough