error loading model

by usrme - opened Apr 5, 2023

Apr 5, 2023

this line went wrong:
model = AutoModel.from_pretrained("d:/chatglm-6b-int4-slim", trust_remote_code=True).half()#.quantize(4).cuda()

RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.

but actually I have 16G ram, no gpu.
this model is supposed to run on 6G ram/cpu, how to handle it ?
thanks!

silver

Owner Apr 5, 2023

Maybe your memory have been consumed by other programs?

You can try to use this one:

https://huggingface.co/silver/chatglm-6b-int4-qe-slim

It consums less memories

usrme

Apr 6, 2023

sorry, i tried chatglm-6b-int4-qe-slim model ,but still the same error.
I just registered yesterday to post feedback, but platform forbid new users to frequently post messages, now
1 comment per day. so I have to wait a whole day to feedback.

I tried the pure python code, given in the modelcard. no C code is involved. Is that the root cause of errors?
I'm pretty sure my laptop has sufficient RAM.
any other clues?
thanks.

silver

Owner Apr 6, 2023

Could please try to print out how much memories are left before loading the model?

import psutil
psutil.virtual_memory()

usrme

Apr 6, 2023

2023-04-06 16:55:27.196480: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-04-06 16:55:27.196696: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
svmem(total=16727375872, available=7232872448, percent=56.8, used=9494503424, free=7232872448)
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
ChatGLMTokenizer(name_or_path='d:/chatglm-6b-int4-slim', vocab_size=130344, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='left', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': '', 'pad_token': '', 'mask_token': '[MASK]'}) █
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "D:\chatglm-6b-int4-slim\run.py", line 6, in
model = AutoModel.from_pretrained("d:/chatglm-6b-int4-slim", trust_remote_code=True).half()#.quantize(4).cuda()
File "D:\soft\prog\python\lib\site-packages\transformers\models\auto\auto_factory.py", line 459, in from_pretrained
return model_class.from_pretrained(
File "D:\soft\prog\python\lib\site-packages\transformers\modeling_utils.py", line 2362, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1019, in init
self.transformer = ChatGLMModel(config)
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 825, in init
[get_layer(layer_id) for layer_id in range(self.num_layers)]
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 825, in
[get_layer(layer_id) for layer_id in range(self.num_layers)]
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 811, in get_layer
return GLMBlock(
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 586, in init
self.mlp = GLU(
File "C:\Users\10020252/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 513, in init
self.dense_h_to_4h = skip_init(
File "D:\soft\prog\python\lib\site-packages\torch\nn\utils\init.py", line 51, in skip_init
return module_cls(*args, **kwargs).to_empty(device=final_device)
File "D:\soft\prog\python\lib\site-packages\torch\nn\modules\module.py", line 788, in to_empty
return self._apply(lambda t: torch.empty_like(t, device=device))
File "D:\soft\prog\python\lib\site-packages\torch\nn\modules\module.py", line 601, in _apply
param_applied = fn(param)
File "D:\soft\prog\python\lib\site-packages\torch\nn\modules\module.py", line 788, in
return self._apply(lambda t: torch.empty_like(t, device=device))
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.
请按任意键继续. . .

silver

Owner Apr 6, 2023

•

edited Apr 6, 2023

svmem(total=16727375872, available=7232872448, percent=56.8, used=9494503424, free=7232872448)

You have about 6.7G free memory to use before loading the model. You better use machines with larger memories or free up more memories before loading the model.

usrme

Apr 7, 2023

thanks a lot~ seems that poor guys have little chances to try these modern toys.. :(
issue closed.

silver changed discussion status to closed Apr 10, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment