Slow generation

#6
by darkandpure - opened

I was trying with a prompt where Im feeding codebase in that , what /I found Is that it is taking hell lot of time for generation with 22gb of gpu where as mistral was taking very less time, Can I know the reason behind and what will be solution for better latency ?

Hugging Face H4 org

Hello @darkandpure can you please share a code snippet of what you're running for inference, along with the tokens / s you're getting? It would also be useful to know what hardware you're running on. Thank you!

Sign up or log in to comment