ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn`
#48
by
Sulilili
- opened
When I ran the demo of phi3 inference code, this error occurred; I tried to install it according to his requirementsf lash_attn`, but this error occurred: RuntimeError: FlashAttention is only supported on CUDA 11.6 and above. Note: make sure nvcc has a supported version by running nvcc -V.
torch.version = 1.11.0+cu113
Please note that the modeling file works without flash-attention:
# Transformers scans dependencies in the modeling file, causing issues on conditional loading. The regex only ignores try/catch blocks, but not if statements
# if is_flash_attn_2_available():
_flash_supports_window_size = False
try:
from flash_attn import flash_attn_func, flash_attn_varlen_func
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
_flash_supports_window_size = "window_size" in list(inspect.signature(flash_attn_func).parameters)
except ImportError as error:
logger.warning(
f"`flash-attention` package not found, consider installing for better performance: {error}."
)
if not _flash_supports_window_size:
logger.warning(
"Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`."
Something else on your environment might be requesting flash-attention.
gugarosa
changed discussion status to
closed