What hardware do I need for reasonable performance?

by TS0001 - opened

Awesome work @TheBloke ! Thank you.

I have this running on runpod.io with Text Generation UI, on an A100 with 80 GB VRAM and 125 GB RAM 16 vCPU. Performance is quite slow. I'm wondering if anyone has it running with reasonable performance, and if so, on what hardware?


I think the issue is AutoGPTQ which is slow, but I don't know enough about it, only what I've been reading people say.

I get ~2 t/s on my 3090 with this model which I consider reasonable for the setup (WSL2). :)

@mancub how much vram does have your 3090? thanks

@TS0001 how much token/sec do you get on the A100? thanks

What is the fastest way to run this model on GPU?

Sign up or log in to comment