关于FlashAttention无法在Teslava100上运行
#37
by
dkwwww
- opened
单卡teslav100,测试qwen-7b模型时,设置use_flash_attn=True后,报错如下:
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
并且最后也无法正常推理,报错如下:
RuntimeError: FlashAttention only supports Ampere GPUs or newer.
是Flash-attention只能在特定架构的gpus上运行吗?
jklj077
changed discussion status to
closed
我使用的是aws ec2,类型是 g4dn(显卡是T4),也出现上述错误,先是提示部分组件使用cpu,最后提示:FlashAttention only supports Ampere GPUs or newer,最后将flash attention卸载了,才可以正常使用