Model usage with GPU and CPU cores usage

#10
by Crocha - opened

Hello everyone, I'm loading the model to be used with gpu, I have the 6 x 3090 layout.
I'm using this form to load the model, I use load_in_8bit due to the amount of memory on the machines

import transformers from transformers import AutoTokenizer, LlamaForCausalLM model = LlamaForCausalLM.from_pretrained("nomic-ai/gpt4all-13b-snoozy", device_map= 'auto',low_cpu_mem_usage=True,load_in_8bit=True) tokenizer = AutoTokenizer.from_pretrained("nomic-ai/gpt4all-13b-snoozy", device_map= 'auto', low_cpu_mem_usage=True,load_in_8bit=True)

I see that even using the GPU the model still uses a little of the CPU cores, in the image to see that it is mainly limited to one core at 100%, would it be possible to make the model stop using the CPU or force it to use all the nuclei at the time of processing ?
image.png

Another question is whether it is normal for him to alternately use the GPUs, it is a little clearer in this image
image.png

Can you see that it uses some gpus but the others are stopped temporarily, is it a normal behavior of the model?

Sign up or log in to comment