Text Generation
Transformers
PyTorch
English
llama
Inference Endpoints
text-generation-inference

Installing ! pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary but flah_llama still erroring out

#25
by ajash - opened

I have installed all the required dependencies to run flash attn.:
! pip install flash-attn --no-build-isolation
! pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary

model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map='auto', trust_remote_code=True, torch_dtype=torch.bfloat16, revision="refs/pr/17")
This is not working. Error:

ImportError: Please install RoPE kernels: pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary

I have already installed this dependency.

Output of:
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map='auto', trust_remote_code=True, torch_dtype=torch.bfloat16)

Downloading (…)lve/main/config.json: 100%
709/709 [00:00<00:00, 62.2kB/s]
Downloading (…)eling_flash_llama.py: 100%
45.3k/45.3k [00:00<00:00, 3.74MB/s]
A new version of the following files was downloaded from https://huggingface.co/togethercomputer/LLaMA-2-7B-32K:
- modeling_flash_llama.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
>>>> Flash Attention installed

ModuleNotFoundError Traceback (most recent call last)
~/.cache/huggingface/modules/transformers_modules/togethercomputer/LLaMA-2-7B-32K/aef6d8946ae1015bdb65c478a2dd73b58daaef47/modeling_flash_llama.py in
51 try:
---> 52 from flash_attn.layers.rotary import apply_rotary_emb_func
53 flash_rope_installed = True

12 frames
ModuleNotFoundError: No module named 'flash_attn.ops.triton'

During handling of the above exception, another exception occurred:

ImportError Traceback (most recent call last)
~/.cache/huggingface/modules/transformers_modules/togethercomputer/LLaMA-2-7B-32K/aef6d8946ae1015bdb65c478a2dd73b58daaef47/modeling_flash_llama.py in
55 except ImportError:
56 flash_rope_installed = False
---> 57 raise ImportError('Please install RoPE kernels: pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary')
58
59

ImportError: Please install RoPE kernels: pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary

Currently a bug in flash-attn. Try installing v2.1.1 for now:

pip install flash-attn==2.1.1 --no-build-isolation
pip install git+https://github.com/HazyResearch/flash-attention.git@v2.1.1#subdirectory=csrc/rotary

that worked... thanks
how does one figure this out by themselves :)

image.png

Sign up or log in to comment