How much memory do I need for this model (on Windows)?

#77
by roboboot - opened

I'm trying to run this model on Windows 11, with 48 GB of RAM and without GPU.

model_id = "../Mixtral-8x7B-Instruct-v0.1"
self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu", low_cpu_mem_usage=True)
self.tokenizer = AutoTokenizer.from_pretrained(model_id)

I see the RAM occupation rises until the error:

Loading checkpoint shards: 16%|β–ˆβ–Œ | 3/19 [01:13<06:31, 24.49s/it]
Process finished with exit code -1073741819 (0xC0000005)

Do I need more memory or is it possible to do something else?

thx

R.

I have run this model on ChatLLM.cpp:

  • For quantized int4, 32 GB of RAM is enough;
  • For quantized int8, 64 GB of RAM is enough.

I think it is impossible to run it with PyTorch on CPU, because PyTorch is not as efficient as GGML on CPU.

ok how can I use quantized int4?

Do I have to use "load_in_4bit=True" ?

thx

R

Yes, I've just used:

    self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu", low_cpu_mem_usage=True, load_in_4bit=True)
    self.tokenizer = AutoTokenizer.from_pretrained(model_id)

And I receive and error that it's not possibile quantization without GPU.

You are right :(

thx

Sign up or log in to comment