Run on Macbook without flash_attn?

#1
by palebluewanders - opened

Supposedly CPU is supported right? I'm trying to run this on Apple silicon, but cannot get past the flash_attn requirement which is for NVIDIA GPUs. How can I get around this?

Traceback (most recent call last):
  File "/Users/demospace/Desktop/cbd/nanoLLaVA/nanoLLaVA.py", line 16, in <module>
    model = AutoModelForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/demospace/cbd/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 550, in from_pretrained
    model_class = get_class_from_dynamic_module(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/demospace/cbd/lib/python3.12/site-packages/transformers/dynamic_module_utils.py", line 489, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/demospace/cbd/lib/python3.12/site-packages/transformers/dynamic_module_utils.py", line 315, in get_cached_module_file
    modules_needed = check_imports(resolved_module_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/demospace/cbd/lib/python3.12/site-packages/transformers/dynamic_module_utils.py", line 180, in check_imports
    raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn

Hi @palebluewanders , can you try this and see what it gives back? FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE pip install flash-attn --no-build-isolation

Hi I'm on the windows platform, facing the same issue. Doing the above, still doesn't let me install the flash-attn module.

Error limit reached.
100 errors detected in the compilation of "csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu".
Compilation terminated.
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc.exe' failed with exit code 4294967295
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

Is there a way to use a different attention implementation, e.g. SDPA? My script fails when it tries to run the AutoModelForCausalLM line, so I wouldn't be able to include SDPA, as per this documentation, before it errors.

Sign up or log in to comment