Any chance of a 13B-20B version?

#7
by smcleod - opened

Is there any chance there will be slightly smaller version somewhere between 13B and 20B~ that's likely to run on more common GPUs with 16GB of vRAM?

A lot of the decent coding models coming out seem to be focused on folks with 24GB+ cards.

How do you know that it is only for a 24 GB card? So if we use the code they generated to show how to use it, we have to have a certain set of specs?

How do you know that it is only for a 24 GB card? So if we use the code they generated to show how to use it, we have to have a certain set of specs?

Because a 34B model won’t fit on a 16GB GPU, quantised at 4bit it should however just fit on a 24GB GPU.

How do you know that it is only for a 24 GB card? So if we use the code they generated to show how to use it, we have to have a certain set of specs?

Because a 34B model won’t fit on a 16GB GPU, quantised at 4bit it should however just fit on a 24GB GPU.

Thanks for the response. Is there anything I can read that will help me understand the math better? In other words, how do you know what fits and what does not? I appreciate any information you can pass along. I am assuming that when I build my next PC, I need to get a GPU that will be able to handle these models, like an RTX 4090?

@corey4005 - so I was able to get the v2 (phind-codellama-34b-v2.Q4_K_M.gguf) of this model running on my little Tesla P100 (16GB), but it's very slow (2.5-3tk/s).

Output generated in 40.89 seconds (2.69 tokens/s, 110 tokens, context 454, seed 403749230)
MEM[|||||||||||||||||15.560Gi/16.000Gi]

V2 GGUF - https://huggingface.co/TheBloke/Phind-CodeLlama-34B-v2-GGUF/blob/main/phind-codellama-34b-v2.Q4_K_M.gguf

Settings:

  • llamacpp_hf
  • gpu layers 33
  • tokens 1024
  • batch 512

Install it with cublas. It insanely bumps up speed for gpu

Does that let you split between cpu and gpu memory though @johnwick123forevr ?

smcleod changed discussion status to closed

yes, you still split between cpu and gpu memory. higher gpu layers=more gpu memory

Sign up or log in to comment