Asking help for using flash-attn on Gradio Zero

#75
by Yiwen-ntu - opened

I'm trying to use flash-attn on Gradio Zero but haven't figured out the way.
The methods in https://discuss.huggingface.co/t/how-to-install-flash-attention-on-hf-gradio-space/70698 don't work as Zero only support Gradio.
Simply adding flash-attn to requirements.txt doesn't work either.

ZeroGPU Explorers org

Hello @Yiwen-ntu , in order to install flash-attn, you must use the following code in your gradio space

import subprocess
# Install flash attention, skipping CUDA build if necessary
subprocess.run(
    "pip install flash-attn --no-build-isolation",
    env={"FLASH_ATTENTION_SKIP_CUDA_BUILD": "TRUE"},
    shell=True,
)

Hi! Thanks for your reply! I have tried this, but it doesn't work. The flash_attn is still not found by Transformers.
Is this because before this installation, the Gradio environment has been launched?

ZeroGPU Explorers org

Can you please provide the error you are getting, or give me a link to your space? I also don't believe the issue is related to gradio, I will check though.

Thanks a lot! I have found the problem. I should put the installation code at the beginning of the app.py.

ZeroGPU Explorers org

You're welcome! Enjoy.

Sign up or log in to comment