RuntimeError: CUDA error: an illegal memory access was encountered
#6
by
MorphzZ
- opened
Hi Bloke,
Hope you are well. I am trying to use this model and get this error:
>>> model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=quant_config, device_map={"":0})
Loading checkpoint shards: 0%| | 0/3 [00:06<?, ?it/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2897, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3236, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/transformers/modeling_utils.py", line 718, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(
File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/transformers/utils/bitsandbytes.py", line 91, in set_module_quantized_tensor_to_device
new_value = bnb.nn.Params4bit(new_value, requires_grad=False, **kwargs).to(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 176, in to
return self.cuda(device)
^^^^^^^^^^^^^^^^^
File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 153, in cuda
w = self.data.contiguous().half().cuda(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I have
>>> torch.__version__
'2.0.1+cu117'
could you please help? Is there another model I can use which does not give this error? from here:
If you get an this issue ("illegal memory access") then you should use a newer HF LLaMA conversion or downgrade your PyTorch version.
I guess I need to re-convert the model weights.
But in the meantime, why not use Vicuna 1.3 instead? It's an upgrade over 1.1 and was made much more recently so hopefully won't have this problem (which I believe is caused by models created with an older version of transformers).
You can download the Vicuna 1.3 model here: https://huggingface.co/lmsys/vicuna-13b-v1.3
thanks Bloke. that works.
MorphzZ
changed discussion status to
closed