textgen webui CUDA memory error on clear cache

#6
by Yhyu13 - opened

Seems like it's an error on the Mixtral expert choosing, does any one have the same issue? Just want to know if its is a known bug for this model, or maybe a bug for the code?

I am on textgen webui https://github.com/oobabooga/text-generation-webui/commit/d8c3a5bee814f09b0868474002105dcf21a3ff1a

Ubuntu 20.04
RTX3090
Nvidia 545.23.08

Traceback (most recent call last):
  File "/home/hangyu5/Documents/Gitrepo-My/text-generation-webui/modules/callbacks.py", line 61, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/Documents/Gitrepo-My/text-generation-webui/modules/text_generation.py", line 376, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/generation/utils.py", line 1764, in generate
    return self.sample(
           ^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/generation/utils.py", line 2861, in sample
    outputs = self(
              ^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 1222, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 1090, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 819, in forward
    hidden_states, router_logits = self.block_sparse_moe(hidden_states)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 736, in forward
    idx, top_x = torch.where(expert_mask[expert_idx])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Owner

I believe it's a bug in the code.

These kind of errors usually happen (specially on Linux) when you don't have enough vram available.
See this:
https://stackoverflow.com/questions/68106457/pytorch-cuda-error-an-illegal-memory-access-was-encountered

TomGrc changed discussion status to closed

Sign up or log in to comment