Dimension mismatch crashes the engine at startup

#18
by Shaulsh - opened

From the Dynamo/FakeTensor trace, the crash is a single F.linear(x, W) call:

call_function torch.nn.functional.linear(
x = FakeTensor(size=(1, s70, 4096), bf16), # input: [..., 4096]
W = Parameter(size=(3840, 8192), bf16), # weight: [out=3840, in=8192]
bias=None
) → RuntimeError: a and b must have same reduction dim, but got [s70, 4096] X [8192, 3840]
A nn.Linear stores weight as [out_features, in_features], so this layer expects an 8192‑wide input (and outputs 3840 = the text hidden size). But the tensor handed to it is 4096‑wide. linear does x @ Wᵀ → needs 4096 == 8192 → fails. (s70 is just the symbolic sequence length from the trace.)

Where (the call stack)

vllm/model_executor/models/transformers/multimodal.py:348 forward
vllm/model_executor/models/transformers/base.py:653 forward
transformers/models/gemma4_unified/modeling_gemma4_unified.py:1121 forward ← the linear is here
This happens inside EngineCore._initialize_kv_caches → determine_available_memory — vLLM's startup dummy forward (it runs the model once under torch.compile/FakeTensors to size the KV cache), so it dies before any real inference.

What's actually wrong
The two key facts:

The stack goes through vllm/model_executor/models/transformers/… — i.e. vLLM has no native gemma4_unified implementation; it's running HF's modeling_gemma4_unified.py through its generic "Transformers backend" wrapper.
8192 = 2 × 4096. That Linear expects a concatenated/fused input (two 4096‑wide streams → 8192), which is exactly what the encoder‑free "unified" Gemma‑4 does internally (it projects raw audio/vision wave features into the text space and fuses them). vLLM's generic Transformers backend only feeds it a single 4096‑wide stream — it doesn't reproduce the unified model's fusion/concat step.
So the root cause is a vLLM↔Transformers integration gap for the gemma4_unified architecture: the generic backend mis‑drives the unified model's fused projection, feeding 4096 where the layer wants 8192.

Sign up or log in to comment