NotImplementedError
#12
by
Soraheart1988
- opened
I have spinup a AWS Sagemaker instance of ml.g4dn.12xlarge (4x16 GPU). Tried the code that uses accelerate library however got following error.
The section of code where I have modify using the provided code.
device_map = infer_auto_device_map(model, max_memory={0:'10GiB',1:'10GiB', 2: '10GiB', 3:'10GiB','cpu':'48GiB'}, no_split_module_classes=['CogVLMDecoderLayer', 'TransformerLayer'])
model = load_checkpoint_and_dispatch(
model,
checkpoint, # typical, '~/.cache/huggingface/hub/models--THUDM--cogvlm-chat-hf/snapshots/balabala'
device_map=device_map,
Error
NotImplementedError Traceback (most recent call last)
Cell In[3], line 44
41 gen_kwargs = {"max_length": 2048, "do_sample": False}
43 with torch.no_grad():
---> 44 outputs = model.generate(**inputs, **gen_kwargs)
114 Raises:
(...)
118 AttentionOp: The best operator for the configuration
119 """
--> 120 return _run_priority_list(
121 "memory_efficient_attention_forward",
122 _dispatch_fw_priority_list(inp, needs_gradient),
123 inp,
124 )
......
File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py:63, in _run_priority_list(name, priority_list, inp)
61 for op, not_supported in zip(priority_list, not_supported_reasons):
62 msg += "\n" + _format_not_supported_reasons(op, not_supported)
---> 63 raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
query : shape=(1, 1226, 16, 112) (torch.bfloat16)
key : shape=(1, 1226, 16, 112) (torch.bfloat16)
value : shape=(1, 1226, 16, 112) (torch.bfloat16)
attn_bias : <class 'NoneType'>
p : 0.0
`decoderF` is not supported because:
attn_bias type is <class 'NoneType'>
bf16 is only supported on A100+ GPUs
`flshattF@v2.3.2` is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
bf16 is only supported on A100+ GPUs
`tritonflashattF` is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
bf16 is only supported on A100+ GPUs
operator wasn't built - see `python -m xformers.info` for more info
triton is not available
requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
Only work on pre-MLIR triton for now
`cutlassF` is not supported because:
bf16 is only supported on A100+ GPUs
`smallkF` is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
dtype=torch.bfloat16 (supported: {torch.float32})
has custom scale
bf16 is only supported on A100+ GPUs
unsupported embed per head: 112
`
Please advise what have I done wrong?
Soraheart1988
changed discussion title from
Error at load_checkpoint_and_dispatch(
to NotImplementedError
chenkq
changed discussion status to
closed