GPU requirements

#59
by thightower1 - opened

I have the demo code working, but it's very slow. My PC has a fast processor and an 8GB GPU Nvidia 4060 installed - I can't seem to find the minimum GPU memory requirements.
Any suggestions would be greatly appreciated!

Running inference only with 4-bit quantization took me around 12.2GB of vram.

Thanks leo4life. Just curious, how was the performance using 4-bit quantization?

I didn't do any benchmarks, but I ran the prompts in the sample code on the model card page and it gave me the same outputs. For other prompts it gives reasonable output as well.

Hi @leo4life I tried to run it in 4-bit in colab but I got CUDA out of memory. Would you share a code snippet to run it. idk I might be missing an important parameter

Sign up or log in to comment