Hardware Question

#1
by Kizna - opened

I'm looking to put together a rig at home that can handle 30b parameters models and below, I know this would mean a pretty penny in terms of hardware. So I wanted to ask somewhere that I might actually get a straight answer.

Would used k, p, and m series Tesla GPU's be suitable for such? And how much VRAM would i be looking at to run a 30b model?

Looking at least 60Gb of VRAM in 16-bit mode. I ran it on the Horde with 2x A6000 and that was good enough...

Looking at least 60Gb of VRAM in 16-bit mode. I ran it on the Horde with 2x A6000 and that was good enough...

That's a actually not all that much, if earlier series Tesla cards can handle it that is. Trying to dip my feet without going huge first.

That's a actually not all that much, if earlier series Tesla cards can handle it that is. Trying to dip my feet without going huge first.

If you wanna jump in and get your feet wet with THIS model right NOW with whatever 6gb+ VRAM 10 series+ GPU you already have, don't mind some JANK, don't mind killing a SATA SSD, and don't mind using text-generation-webui to manage your models, you can always abuse Windows' pagefile system or Linux's SWAP partitions to inflate your RAM pool, and then shunt it all over there using the --auto-devices and --gpu-memory options together.
Personally, I'm running it with just an old 120gb Samsung 860 Evo I got for $25 (80gb pagefile with the rest of the drive set aside for Samsung's automatic overprovisioning) and my dinky mobile 1660ti 6gb. I'm barely pulling it off without running VMEM out while chatting with the bot, it'll eventually kill the SSD, and each response takes up to an hour if the history pool gets too big or complex, but it works.

@KSlith Can this run on a 3080ti with 12GB of vram and 64GB of ram without adding the pagefile since I don't want to fry my SSDs?
Also, sorry for hijacking the thread...

@KSlith Can this run on a 3080ti with 12GB of vram and 64GB of ram without adding the pagefile since I don't want to fry my SSDs?
Also, sorry for hijacking the thread...

It might, but only barely. Remember to cap your GPU RAM usage to ~9gb so you have enough space for the model to spread it's legs. In the 3080's case, you're going to bottleneck on your System RAM speed.

@KSlith Is there a way to combine the system ram with the vram (any tutorial), I continue to receive cuda out of memory errors when loading any model over 6b. buying more ram is much more affordable than any ampere card. Also, any information about ML models that can help me make my own would be appreciated, I am very new to ML models

@Sylveon if you are using text generation webui then it's pretty simple
Check this: https://github.com/oobabooga/text-generation-webui/wiki/Low-VRAM-guide

@KSlith thanks for the support, I would love to run the 30B model (currently 1080ti and 32gv ram) by maybe buying more system ram, but would it really be worth it? running a 30B model over a 13B model. I know there are metrics for testing how well a text gen model is capable. but I have not found any examples comparing many models input/output and really analyzing the differences. Also do you know of a good tutorial for how to create my own model? something small just for fun.

What about CPU ONLY?
Lets say 44 cores\88 threads and 128GB RAM?

@PaulSolo that is a lot of ram. Have you tried it?

Sign up or log in to comment