RuntimeError: Shape mismatch: a.size(1) = 8192, size_k = 4096

#2
by LIFengJu - opened

My Environment:
GPU: H200*8
Python: 3.10
CUDA : 13.3
OS: Linux x86_64
PyTorch: 2.12.0 (torch2.12.0+cu130)

Error encountered during 8-GPU distributed inference using torchrun.
Error Description
After entering any prompt interactively (e.g., "hello"), the program throws an error and exits. The error observed across all ranks is a RuntimeError: Shape mismatch; specific details follow below.

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/all_shared/models/Intel/DeepSeek-V4-Pro-W4A16-AutoRound/inference/generate.py", line 159, in <module>
[rank0]:     main(args.ckpt_path, args.config, args.input_file, args.interactive, args.max_new_tokens, args.temperature)
[rank0]:   File "/home/all_shared/models/Intel/DeepSeek-V4-Pro-W4A16-AutoRound/inference/generate.py", line 130, in main
[rank0]:     completion_tokens = generate(model, [prompt_tokens], max_new_tokens, tokenizer.eos_token_id, temperature)
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/all_shared/models/Intel/DeepSeek-V4-Pro-W4A16-AutoRound/inference/generate.py", line 53, in generate
[rank0]:     logits = model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/all_shared/models/Intel/DeepSeek-V4-Pro-W4A16-AutoRound/inference/model.py", line 972, in forward
[rank0]:     h = layer(h, start_pos, input_ids)
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/all_shared/models/Intel/DeepSeek-V4-Pro-W4A16-AutoRound/inference/model.py", line 842, in forward
[rank0]:     x = self.attn(x, start_pos)
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/all_shared/models/Intel/DeepSeek-V4-Pro-W4A16-AutoRound/inference/model.py", line 646, in forward
[rank0]:     o = self.wo_a(o.flatten(2)).view(bsz, seqlen, self.n_local_groups, self.o_lora_rank)
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/all_shared/models/Intel/DeepSeek-V4-Pro-W4A16-AutoRound/inference/model.py", line 260, in forward
[rank0]:     return Linear.forward(self, x)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/all_shared/models/Intel/DeepSeek-V4-Pro-W4A16-AutoRound/inference/model.py", line 241, in forward
[rank0]:     y = self._woq(x.to(torch.bfloat16))
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/gptqmodel/nn_modules/qlinear/marlin.py", line 301, in forward
[rank0]:     out = apply_gptq_marlin_linear(
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/gptqmodel/utils/marlin.py", line 216, in apply_gptq_marlin_linear
[rank0]:     output = gptq_marlin_gemm(reshaped_x,
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/anaconda3/envs/dsv4/lib/python3.11/site-packages/gptqmodel/utils/marlin.py", line 299, in gptq_marlin_gemm
[rank0]:     return gptqmodel_marlin_kernels.gptq_marlin_gemm(a, c, b_q_weight, b_bias, b_scales,
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: RuntimeError: Shape mismatch: a.size(1) = 8192, size_k = 4096

by teh way,the model is "Intel/DeepSeek-V4-Pro-W4A16-AutoRound".

Sign up or log in to comment