THUDM/cogvlm-chat-hf · NotImplementedError

Soraheart1988

Dec 6, 2023

•

edited Dec 6, 2023

I have spinup a AWS Sagemaker instance of ml.g4dn.12xlarge (4x16 GPU). Tried the code that uses accelerate library however got following error.

The section of code where I have modify using the provided code.

device_map = infer_auto_device_map(model, max_memory={0:'10GiB',1:'10GiB', 2: '10GiB', 3:'10GiB','cpu':'48GiB'}, no_split_module_classes=['CogVLMDecoderLayer', 'TransformerLayer'])
model = load_checkpoint_and_dispatch(
    model,
    checkpoint,   # typical, '~/.cache/huggingface/hub/models--THUDM--cogvlm-chat-hf/snapshots/balabala'
    device_map=device_map,

Error

NotImplementedError                       Traceback (most recent call last)
Cell In[3], line 44
     41 gen_kwargs = {"max_length": 2048, "do_sample": False}
     43 with torch.no_grad():
---> 44     outputs = model.generate(**inputs, **gen_kwargs)
    114     Raises:
   (...)
    118         AttentionOp: The best operator for the configuration
    119     """
--> 120     return _run_priority_list(
    121         "memory_efficient_attention_forward",
    122         _dispatch_fw_priority_list(inp, needs_gradient),
    123         inp,
    124     )
......
File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py:63, in _run_priority_list(name, priority_list, inp)
     61 for op, not_supported in zip(priority_list, not_supported_reasons):
     62     msg += "\n" + _format_not_supported_reasons(op, not_supported)
---> 63 raise NotImplementedError(msg)

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 1226, 16, 112) (torch.bfloat16)
     key         : shape=(1, 1226, 16, 112) (torch.bfloat16)
     value       : shape=(1, 1226, 16, 112) (torch.bfloat16)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`decoderF` is not supported because:
    attn_bias type is <class 'NoneType'>
    bf16 is only supported on A100+ GPUs
`flshattF@v2.3.2` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    bf16 is only supported on A100+ GPUs
`tritonflashattF` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    bf16 is only supported on A100+ GPUs
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
    requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
    Only work on pre-MLIR triton for now
`cutlassF` is not supported because:
    bf16 is only supported on A100+ GPUs
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    dtype=torch.bfloat16 (supported: {torch.float32})
    has custom scale
    bf16 is only supported on A100+ GPUs
    unsupported embed per head: 112
`
Please advise what have I done wrong?

Soraheart1988 changed discussion title from Error at load_checkpoint_and_dispatch( to NotImplementedError Dec 6, 2023

chenkq

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Dec 6, 2023

try fp16 maybe? https://huggingface.co/THUDM/cogvlm-chat-hf/discussions/5#6567f0b792d9319907436b10

chenkq changed discussion status to closed Dec 11, 2023