Text Generation
Transformers
Safetensors
English
llama
causal-lm
text-generation-inference
4-bit precision

What are the hardware requirements for this? I am running out of memory on my RTX3060 Ti :o

#21
by yramshev - opened

What are the hardware requirements for this? I am running out of memory on my RTX3060 Ti :o
PC:
16gB Ram
NVIDIA 3060Ti
AMD Ryzen5 3600

I guess 8.7GB VRam, you have 8 on 3060TI, but model vicuna-7B should works fine.

Yeah I don't think you can get away with using a 13B model on an 8GB card. A 7B model should be fine.

And/or, check out the GGML version of this model and try it with the new GPU-accelerated llama.cpp. That allows you to offload as many layers to GPU as you have VRAM for, and the rest is done on CPU. Early reports are that it's performing very well. And the new GPU inference is now supported in text-generation-webui.

Sign up or log in to comment