Running this on cpu requires flash_attn ! but we cant install flash_attn on cpu
just tried to run the code provided in the repo but it throws out this error
Traceback (most recent call last):
File "test_server.py", line 82, in <module>
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\....\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 550, in from_pretrained
model_class = get_class_from_dynamic_module(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\....\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\dynamic_module_utils.py", line 501, in get_class_from_dynamic_module
final_module = get_cached_module_file(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\....\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\dynamic_module_utils.py", line 326, in get_cached_module_file
modules_needed = check_imports(resolved_module_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\....\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\dynamic_module_utils.py", line 181, in check_imports
raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn`
I ended up installing an older version to get it operational on my windows machine.
pip install flash-attn===1.0.4 --no-build-isolation
Hope that works
Nope, still the same thing! can't install flash_attn
see, here are the logs:
pip install flash-attn===1.0.4 --no-build-isolation
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting flash-attn===1.0.4
Downloading flash_attn-1.0.4.tar.gz (2.0 MB)
ββββββββββββββββββββββββββββββββββββββββ 2.0/2.0 MB 4.6 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
Γ python setup.py egg_info did not run successfully.
β exit code: 1
β°β> [22 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "C:\Users\vandu\AppData\Local\Temp\pip-install-4472dd_e\flash-attn_53d08ad4d8ec487babe7ba8ed3131e5a\setup.py", line 106, in <module>
raise_if_cuda_home_none("flash_attn")
File "C:\Users\vandu\AppData\Local\Temp\pip-install-4472dd_e\flash-attn_53d08ad4d8ec487babe7ba8ed3131e5a\setup.py", line 53, in raise_if_cuda_home_none
raise RuntimeError(
RuntimeError: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
Warning: Torch did not find available GPUs on this system.
If your intention is to cross-compile, this is not an error.
By default, Apex will cross-compile for Pascal (compute capabilities 6.0, 6.1, 6.2),
Volta (compute capability 7.0), Turing (compute capability 7.5),
and, if the CUDA version is >= 11.0, Ampere (compute capability 8.0).
If you wish to cross-compile for a single specific architecture,
export TORCH_CUDA_ARCH_LIST="compute capability" before running setup.py.
torch.__version__ = 2.3.1+cpu
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Γ Encountered error while generating package metadata.
β°β> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
I don't know much either, but I think it only works if you have the pytorch cuda version (an NVIDIA GPU will be needed) .
If you have one, you'll need to install CUDA Toolkit (https://developer.nvidia.com/cuda-downloads), then uninstall pytorch with pip uninstall torch
, and finally install torch with the cuda version you have (I have the 12.5, but the max pytorch cuda version is 12.4; it works fine) pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
(installation command from https://pytorch.org/)
Then you should be able to pip install flash-attn
. (Update: You will need to run pip install --upgrade pip setuptools wheel
before flash-attn installation command).
My problem is not with flash_attn
It's with the model not running on cpu
This is caused by the transformers dynamic_module_utils function get_imports mistakenly listing flash_attn as requirement, even if it's not actually used or even loaded.
Exact same issue as discussed here: https://huggingface.co/microsoft/phi-1_5/discussions/72
The same workaround works for Florence2 as well:
#workaround for unnecessary flash_attn requirement
from unittest.mock import patch
from transformers.dynamic_module_utils import get_imports
def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
if not str(filename).endswith("modeling_florence2.py"):
return get_imports(filename)
imports = get_imports(filename)
imports.remove("flash_attn")
return imports
with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports): #workaround for unnecessary flash_attn requirement
model = AutoModelForCausalLM.from_pretrained(model_path, attn_implementation="sdpa", torch_dtype=dtype,trust_remote_code=True)
I'm using this with my ComfyUI node and it's running fine without flash_attn even installed. I don't notice any performance difference either.
Oh thanks buddy π, by the way I was going to build a node for Comfy but now you made it I don't have to π thanks
is there a way to run florance 2 on gpu without flash_atten , I want to finetune this model .