Very long response time

#130
by farbodKMSE - opened

Hello,

After downloading and loading the model by :

text2text_generator = pipeline(
"text-generation",
model = "mistralai/Mistral-7B-v0.1",
max_length = 20)

It takes 15 minutes to generate : [{'generated_text': 'where is the capitale of germany?\n\nBerlin is the capitale of germany'}]
by text2text_generator("where is the capitale of germany?")

am I doing something wrong ? is there a way to reduce this response time ?

My setup is : MacBook Pro 2018, cpu : 2,9 Ghz intel core i9, Memory : 32 DDR4, Graphic : intel UHD Graphics 630 1536 MB

deleted

Time to get a better machine i think.

Thank you for your response,
If I want to deploy this model as part of a application on a server, what kind of setup should I ask for the server?

deleted
edited Mar 6

I wont say i'm 'the' expert, but you need to look into NVIDIA GPU.. running this stuff on CPU is going to be painful at best. I run this sort of stuff on an old 12G Titan and its still a world of difference between that and
even a decent CPU. You can get far better than i have these days for not much budget.

Oh and for local dev work, might consider a GGUF format. will run faster and be good enough

Sign up or log in to comment