ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn`

#48
by Sulilili - opened

When I ran the demo of phi3 inference code, this error occurred; I tried to install it according to his requirementsf lash_attn`, but this error occurred: RuntimeError: FlashAttention is only supported on CUDA 11.6 and above. Note: make sure nvcc has a supported version by running nvcc -V.
torch.version = 1.11.0+cu113

Microsoft org

Please note that the modeling file works without flash-attention:

# Transformers scans dependencies in the modeling file, causing issues on conditional loading. The regex only ignores try/catch blocks, but not if statements
# if is_flash_attn_2_available():
_flash_supports_window_size = False
try:
    from flash_attn import flash_attn_func, flash_attn_varlen_func
    from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input  # noqa

    _flash_supports_window_size = "window_size" in list(inspect.signature(flash_attn_func).parameters)
except ImportError as error:
    logger.warning(
        f"`flash-attention` package not found, consider installing for better performance: {error}."
    )
    if not _flash_supports_window_size:
        logger.warning(
            "Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`."

Something else on your environment might be requesting flash-attention.

gugarosa changed discussion status to closed

Sign up or log in to comment