lmsys/vicuna-13b-v1.1 · RuntimeError: CUDA error: an illegal memory access was encountered

Jun 21, 2023

Hi Bloke,

Hope you are well. I am trying to use this model and get this error:

>>> model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=quant_config, device_map={"":0})
Loading checkpoint shards:   0%|                                 | 0/3 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2897, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3236, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/transformers/modeling_utils.py", line 718, in _load_state_dict_into_meta_model
    set_module_quantized_tensor_to_device(
  File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/transformers/utils/bitsandbytes.py", line 91, in set_module_quantized_tensor_to_device
    new_value = bnb.nn.Params4bit(new_value, requires_grad=False, **kwargs).to(device)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 176, in to
    return self.cuda(device)
           ^^^^^^^^^^^^^^^^^
  File "/mnt/disks/sdb/finetuning-with-qlora/.env/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 153, in cuda
    w = self.data.contiguous().half().cuda(device)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I have

>>> torch.__version__
'2.0.1+cu117'

could you please help? Is there another model I can use which does not give this error? from here:

If you get an this issue ("illegal memory access") then you should use a newer HF LLaMA conversion or downgrade your PyTorch version.

TheBloke

Jun 22, 2023

I guess I need to re-convert the model weights.

But in the meantime, why not use Vicuna 1.3 instead? It's an upgrade over 1.1 and was made much more recently so hopefully won't have this problem (which I believe is caused by models created with an older version of transformers).

You can download the Vicuna 1.3 model here: https://huggingface.co/lmsys/vicuna-13b-v1.3

MorphzZ

Jun 22, 2023

thanks Bloke. that works.

MorphzZ changed discussion status to closed Jun 22, 2023