How much ram it needs?

#31
by Turbina99 - opened

How much ram it needs? my 24GB was not enought
looks like it is being run on CPU

24GB should be more than enough for a 6B model... I run the Pygmalion 7B model in full BF16 precision on my 16GB 4080. If it's running on the CPU then it's more likely that you haven't installed one of the required libraries or something. I would suggest using Oobabooga installed via their installation scripts, here is a link to the Windows version: https://github.com/oobabooga/text-generation-webui/releases/download/installers/oobabooga_windows.zip

The benefit of using their setup script is that it will install everything you need for your hardware. Also, if you tried using the GPU and the memory was not enough, it would likely just die and not work at all, I don't think it would magically switch to CPU mode without you telling it to, so it sounds more like something's not set up for it to use the GPU...

well I tired to run it in pycharm using:

from transformers import pipeline

text_generation = pipeline("text-generation", model="PygmalionAI/pygmalion-6b")
generated_text = text_generation("Hello, how are you?")
print(generated_text[0]['generated_text'])

on option with GPU, also does not work:
from transformers import pipeline

text_generation = pipeline("text-generation",
model="PygmalionAI/pygmalion-6b",
device=0) # specify the GPU device number

generated_text = text_generation("Hello, how are you?")
print(generated_text[0]['generated_text'])

and the RAM memroy 24GB is full

Oh, you're trying to do this in code? I'll pass this over to someone else to support, my suggestion is to start with oobabooga as it has both example code and installs all the libraries you need. As I said earlier, GPU support requires a whole bunch of extra libraries, it's not going to work if you don't have them installed. If your hardware doesn't support 16bit, then you might have to load it in 8 bit mode. Again, check ooba for code/requirement examples. Also, do remember that oobabooga provides a Kobold compatible API and a new streaming text API, so you can connect to it via API and use it that way as well.

Yes i tried to load it in pycharm, becase i wanted to attach some logic

I experienced the same thing. It takes 26-27gb so just barely too big for the gpu. The precision would help but I'm not sure where to set that.

Found you can set the precision by calling model.half()
Need to also call that at the end of your input tensor.

Something to note your cpu might not support all the fp16 operations so if you use this is will likely only run on the gpu now. So just make sure to call model.cuda()

its running fine on my 8gb 3060ti. Sure response time is somewhat between 15 to 25 seconds. But I can life with that.

Sign up or log in to comment