Problem with tensor parallel size==2

#14
by borodulinaad - opened

When using tensor_parallel_size = 2, I get an error, while with tensor_parallel_size == 1, everything works as it should.

Here's the error:
RuntimeError: Worker failed with error 'RuntimeError when making fake tensor call
[core.py:1202] Explanation: Dynamo failed to run FX node with fake tensors: call_function <built-in method matmul of type object at 0x7f6116ee5f40>(*(FakeTensor(..., device='cuda:0', size=(s21, (s88//s21), s3),
[core.py:1202] dtype=torch.bfloat16), Parameter(FakeTensor(..., device='cuda:0', size=(131072, 2816), dtype=torch.bfloat16))), **{}): got RuntimeError('a and b must have same reduction dim, but got [s21*((s88//s21)), s3] X [131072, 2816].')
[core.py:1202] Hint: Your code may result in an error when running in eager. Please double check that your code doesn't contain a similar error when actually running eager/uncompiled. You can do this by removing the torch.compile call, or by using torch.compiler.set_stance("force_eager").

Library versions:
vllm==0.23.1rc1.dev54+g3f1ff1ff1 (nightly from https://recipes.vllm.ai/Google/diffusiongemma-26B-A4B-it?hardware=h100)
transformers==5.12.1
torch==2.11.0
torchaudio==2.11.0+cu130
torchvision==0.26.0+cu130
accelerate==1.14.0

Thanks for reporting! Here is a fix proposed by another user: https://github.com/vllm-project/vllm/pull/45774: either this would get merged soon OR we will send another PR

Sign up or log in to comment