Error using multiple GPUs

#20
by JesusUned - opened

I'm trying to use the model with 4x NVIDIA RTX A5000 25GB RAM each. When i try to load the model i get this error (i get this error when i use auto and sequential in device_map variable)

Thanks!!

      1 gpu='auto'
----> 2 model = AutoModel.from_pretrained('intfloat/e5-mistral-7b-instruct', device_map=gpu).cuda()
      3 tokenizer = AutoTokenizer.from_pretrained('intfloat/e5-mistral-7b-instruct')
      5 max_length = 4096

File ~/llama/lib/python3.11/site-packages/accelerate/big_modeling.py:416, in dispatch_model.<locals>.add_warning.<locals>.wrapper(*args, **kwargs)
    414     if param.device == torch.device("meta"):
    415         raise RuntimeError("You can't move a model that has some modules offloaded to cpu or disk.")
--> 416 return fn(*args, **kwargs)

File ~/llama/lib/python3.11/site-packages/transformers/modeling_utils.py:2243, in PreTrainedModel.cuda(self, *args, **kwargs)
   2238     raise ValueError(
   2239         "Calling `cuda()` is not supported for `4-bit` or `8-bit` quantized models. Please use the model as it is, since the"
   2240         " model has already been set to the correct devices and casted to the correct `dtype`."
   2241     )
   2242 else:
-> 2243     return super().cuda(*args, **kwargs)

File ~/llama/lib/python3.11/site-packages/torch/nn/modules/module.py:918, in Module.cuda(self, device)
    901 def cuda(self: T, device: Optional[Union[int, device]] = None) -> T:
    902     r"""Moves all model parameters and buffers to the GPU.
    903 
    904     This also makes associated parameters and buffers different objects. So
   (...)
    916         Module: self
    917     """
--> 918     return self._apply(lambda t: t.cuda(device))

File ~/llama/lib/python3.11/site-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
    808 if recurse:
    809     for module in self.children():
--> 810         module._apply(fn)
    812 def compute_should_use_set_data(tensor, tensor_applied):
    813     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    814         # If the new tensor has compatible tensor type as the existing tensor,
    815         # the current behavior is to change the tensor in-place using `.data =`,
   (...)
    820         # global flag to let the user control whether they want the future
    821         # behavior of overwriting the existing tensor or not.

File ~/llama/lib/python3.11/site-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
    808 if recurse:
    809     for module in self.children():
--> 810         module._apply(fn)
    812 def compute_should_use_set_data(tensor, tensor_applied):
    813     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    814         # If the new tensor has compatible tensor type as the existing tensor,
    815         # the current behavior is to change the tensor in-place using `.data =`,
   (...)
    820         # global flag to let the user control whether they want the future
    821         # behavior of overwriting the existing tensor or not.

    [... skipping similar frames: Module._apply at line 810 (1 times)]

File ~/llama/lib/python3.11/site-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
    808 if recurse:
    809     for module in self.children():
--> 810         module._apply(fn)
    812 def compute_should_use_set_data(tensor, tensor_applied):
    813     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    814         # If the new tensor has compatible tensor type as the existing tensor,
    815         # the current behavior is to change the tensor in-place using `.data =`,
   (...)
    820         # global flag to let the user control whether they want the future
    821         # behavior of overwriting the existing tensor or not.

File ~/llama/lib/python3.11/site-packages/torch/nn/modules/module.py:833, in Module._apply(self, fn, recurse)
    829 # Tensors stored in modules are graph leaves, and we don't want to
    830 # track autograd history of `param_applied`, so we have to use
    831 # `with torch.no_grad():`
    832 with torch.no_grad():
--> 833     param_applied = fn(param)
    834 should_use_set_data = compute_should_use_set_data(param, param_applied)
    835 if should_use_set_data:

File ~/llama/lib/python3.11/site-packages/torch/nn/modules/module.py:918, in Module.cuda.<locals>.<lambda>(t)
    901 def cuda(self: T, device: Optional[Union[int, device]] = None) -> T:
    902     r"""Moves all model parameters and buffers to the GPU.
    903 
    904     This also makes associated parameters and buffers different objects. So
   (...)
    916         Module: self
    917     """
--> 918     return self._apply(lambda t: t.cuda(device))

OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacty of 23.68 GiB of which 1.27 GiB is free. Including non-PyTorch memory, this process has 22.41 GiB memory in use. Of the allocated memory 22.21 GiB is allocated by PyTorch, and 1.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON

You may try to load with float16:
model = AutoModel.from_pretrained('intfloat/e5-mistral-7b-instruct', torch_dtype=torch.float16).cuda()

This will halve the GPU memory requirements.

Thanks, problem solved!!

You may try to load with float16:
model = AutoModel.from_pretrained('intfloat/e5-mistral-7b-instruct', torch_dtype=torch.float16).cuda()

This will halve the GPU memory requirements.

Thanks @intfloat This worked like a charm for me.
Quick follow up question please, does this affect the dimension or the output values in any way? Would I need to multiply the output value by a factor?

Does anybody know how I can implement this through Langchain, which is leveraging Sentence Transformers rather than AutoModel?
It's a maze navigating through so many dependencies, I'm not sure how to pass the torch_dtype.

Sign up or log in to comment