Very long response time

#130

by farbodKMSE - opened Mar 6, 2024

Discussion

farbodKMSE

Mar 6, 2024

Hello,

After downloading and loading the model by :

text2text_generator = pipeline(
"text-generation",
model = "mistralai/Mistral-7B-v0.1",
max_length = 20)

It takes 15 minutes to generate : [{'generated_text': 'where is the capitale of germany?\n\nBerlin is the capitale of germany'}]
by text2text_generator("where is the capitale of germany?")

am I doing something wrong ? is there a way to reduce this response time ?

My setup is : MacBook Pro 2018, cpu : 2,9 Ghz intel core i9, Memory : 32 DDR4, Graphic : intel UHD Graphics 630 1536 MB

deleted

Mar 6, 2024

Time to get a better machine i think.

farbodKMSE

Mar 6, 2024

Thank you for your response,
If I want to deploy this model as part of a application on a server, what kind of setup should I ask for the server?

deleted

Mar 6, 2024

•

edited Mar 6, 2024

I wont say i'm 'the' expert, but you need to look into NVIDIA GPU.. running this stuff on CPU is going to be painful at best. I run this sort of stuff on an old 12G Titan and its still a world of difference between that and
even a decent CPU. You can get far better than i have these days for not much budget.

Oh and for local dev work, might consider a GGUF format. will run faster and be good enough

Ksyu22

Jun 30, 2024

I used GoogleColab GPU and it took me 10 min to generate this string 'A list of colors: red, blue, green, yellow, orange, purple, pink,' :

model_inputs = tokenizer(["A list of colors: red, blue"], return_tensors="pt").to("cuda")
generated_ids = model.generate(**model_inputs, pad_token_id=tokenizer.eos_token_id)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment