Running this on cpu requires flash_attn ! but we cant install flash_attn on cpu

#4
by Meshwa - opened

just tried to run the code provided in the repo but it throws out this error

Traceback (most recent call last):
  File "test_server.py", line 82, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\....\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 550, in from_pretrained
    model_class = get_class_from_dynamic_module(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\....\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\dynamic_module_utils.py", line 501, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\....\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\dynamic_module_utils.py", line 326, in get_cached_module_file
    modules_needed = check_imports(resolved_module_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\....\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\dynamic_module_utils.py", line 181, in check_imports
    raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn`

I ended up installing an older version to get it operational on my windows machine.
pip install flash-attn===1.0.4 --no-build-isolation
Hope that works

Nope, still the same thing! can't install flash_attn

see, here are the logs:

pip install flash-attn===1.0.4 --no-build-isolation

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting flash-attn===1.0.4
  Downloading flash_attn-1.0.4.tar.gz (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 4.6 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  Γ— python setup.py egg_info did not run successfully.
  β”‚ exit code: 1
  ╰─> [22 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\vandu\AppData\Local\Temp\pip-install-4472dd_e\flash-attn_53d08ad4d8ec487babe7ba8ed3131e5a\setup.py", line 106, in <module>
          raise_if_cuda_home_none("flash_attn")
        File "C:\Users\vandu\AppData\Local\Temp\pip-install-4472dd_e\flash-attn_53d08ad4d8ec487babe7ba8ed3131e5a\setup.py", line 53, in raise_if_cuda_home_none
          raise RuntimeError(
      RuntimeError: flash_attn was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.

      Warning: Torch did not find available GPUs on this system.
       If your intention is to cross-compile, this is not an error.
      By default, Apex will cross-compile for Pascal (compute capabilities 6.0, 6.1, 6.2),
      Volta (compute capability 7.0), Turing (compute capability 7.5),
      and, if the CUDA version is >= 11.0, Ampere (compute capability 8.0).
      If you wish to cross-compile for a single specific architecture,
      export TORCH_CUDA_ARCH_LIST="compute capability" before running setup.py.



      torch.__version__  = 2.3.1+cpu


      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Γ— Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I don't know much either, but I think it only works if you have the pytorch cuda version (an NVIDIA GPU will be needed) .

If you have one, you'll need to install CUDA Toolkit (https://developer.nvidia.com/cuda-downloads), then uninstall pytorch with pip uninstall torch, and finally install torch with the cuda version you have (I have the 12.5, but the max pytorch cuda version is 12.4; it works fine) pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 (installation command from https://pytorch.org/)

Then you should be able to pip install flash-attn. (Update: You will need to run pip install --upgrade pip setuptools wheel before flash-attn installation command).

My problem is not with flash_attn
It's with the model not running on cpu

This is caused by the transformers dynamic_module_utils function get_imports mistakenly listing flash_attn as requirement, even if it's not actually used or even loaded.

Exact same issue as discussed here: https://huggingface.co/microsoft/phi-1_5/discussions/72

The same workaround works for Florence2 as well:

#workaround for unnecessary flash_attn requirement
from unittest.mock import patch
from transformers.dynamic_module_utils import get_imports

def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
    if not str(filename).endswith("modeling_florence2.py"):
        return get_imports(filename)
    imports = get_imports(filename)
    imports.remove("flash_attn")
    return imports
 with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports): #workaround for unnecessary flash_attn requirement
            model = AutoModelForCausalLM.from_pretrained(model_path, attn_implementation="sdpa", torch_dtype=dtype,trust_remote_code=True)

I'm using this with my ComfyUI node and it's running fine without flash_attn even installed. I don't notice any performance difference either.

Oh thanks buddy πŸ˜€, by the way I was going to build a node for Comfy but now you made it I don't have to 😝 thanks

Meshwa changed discussion status to closed

is there a way to run florance 2 on gpu without flash_atten , I want to finetune this model .

Sign up or log in to comment