error loading model

#2
by usrme - opened

this line went wrong:
model = AutoModel.from_pretrained("d:/chatglm-6b-int4-slim", trust_remote_code=True).half()#.quantize(4).cuda()

RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.

but actually I have 16G ram, no gpu.
this model is supposed to run on 6G ram/cpu, how to handle it ?
thanks!

Maybe your memory have been consumed by other programs?

You can try to use this one:

https://huggingface.co/silver/chatglm-6b-int4-qe-slim

It consums less memories

sorry, i tried chatglm-6b-int4-qe-slim model ,but still the same error.
I just registered yesterday to post feedback, but platform forbid new users to frequently post messages, now
1 comment per day. so I have to wait a whole day to feedback.

I tried the pure python code, given in the modelcard. no C code is involved. Is that the root cause of errors?
I'm pretty sure my laptop has sufficient RAM.
any other clues?
thanks.

Could please try to print out how much memories are left before loading the model?

import psutil
psutil.virtual_memory()

2023-04-06 16:55:27.196480: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-04-06 16:55:27.196696: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
svmem(total=16727375872, available=7232872448, percent=56.8, used=9494503424, free=7232872448)
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
ChatGLMTokenizer(name_or_path='d:/chatglm-6b-int4-slim', vocab_size=130344, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='left', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': '', 'pad_token': '', 'mask_token': '[MASK]'}) █
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "D:\chatglm-6b-int4-slim\run.py", line 6, in
model = AutoModel.from_pretrained("d:/chatglm-6b-int4-slim", trust_remote_code=True).half()#.quantize(4).cuda()
File "D:\soft\prog\python\lib\site-packages\transformers\models\auto\auto_factory.py", line 459, in from_pretrained
return model_class.from_pretrained(
File "D:\soft\prog\python\lib\site-packages\transformers\modeling_utils.py", line 2362, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1019, in init
self.transformer = ChatGLMModel(config)
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 825, in init
[get_layer(layer_id) for layer_id in range(self.num_layers)]
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 825, in
[get_layer(layer_id) for layer_id in range(self.num_layers)]
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 811, in get_layer
return GLMBlock(
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 586, in init
self.mlp = GLU(
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 513, in init
self.dense_h_to_4h = skip_init(
File "D:\soft\prog\python\lib\site-packages\torch\nn\utils\init.py", line 51, in skip_init
return module_cls(*args, **kwargs).to_empty(device=final_device)
File "D:\soft\prog\python\lib\site-packages\torch\nn\modules\module.py", line 788, in to_empty
return self._apply(lambda t: torch.empty_like(t, device=device))
File "D:\soft\prog\python\lib\site-packages\torch\nn\modules\module.py", line 601, in _apply
param_applied = fn(param)
File "D:\soft\prog\python\lib\site-packages\torch\nn\modules\module.py", line 788, in
return self._apply(lambda t: torch.empty_like(t, device=device))
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.
请按任意键继续. . .

svmem(total=16727375872, available=7232872448, percent=56.8, used=9494503424, free=7232872448)

You have about 6.7G free memory to use before loading the model. You better use machines with larger memories or free up more memories before loading the model.

thanks a lot~ seems that poor guys have little chances to try these modern toys.. :(
issue closed.

silver changed discussion status to closed

Sign up or log in to comment