Windows: modeling_qwen2.py raises NotImplementedError for 'eager' attention with no automatic fallback

#17
by kanklc - opened

Environment

  • OS: Windows 11
  • GPU: RTX 4060 (8GB VRAM)
  • CUDA: 12.4
  • torch: 2.6.0+cu124
  • transformers: 4.51.3

Problem

On Windows with native Python (no Docker/WSL2), loading the model and calling
model.generate() raises:

NotImplementedError: self._attn_implementation='eager'

This happens because modeling_qwen2.py line 1335 raises NotImplementedError
for 'eager' attention, but the automatic fallback to 'sdpa' (lines 932-947)
does not trigger on Windows.

Passing attn_implementation='sdpa' to from_pretrained() also does not fix
it — the model loads but still uses 'eager' internally.

Workaround

After loading the model, manually patch all submodules:

for module in model.modules():
    if hasattr(module, '_attn_implementation'):
        module._attn_implementation = 'sdpa'
    if hasattr(module, 'config') and hasattr(module.config, '_attn_implementation'):
        module.config._attn_implementation = 'sdpa'

Suggestion

Either fix the fallback logic on Windows, or raise a clear error message
pointing users to this workaround instead of a cryptic NotImplementedError.

Sign up or log in to comment