Instead of flash_attn it should be flash_attn_2_cuda . This is causing a deployment issue in TGI/DJL
#14
by
monuminu
- opened
from flash_attn.flash_attn_interface import (
flash_attn_func,
flash_attn_kvpacked_func,
flash_attn_qkvpacked_func,
flash_attn_varlen_kvpacked_func,
)
Hi @monuminu thanks for bringing this up! Can you provide some more details about the issues this is causing?