Text Generation
Transformers
Safetensors
dbrx
conversational
text-generation-inference

Bug on AMD MI 250 with flash-attention

#13
by PierreColombo - opened

Hello,

Thanks a lot for the model and congrats for publishing a strong model.

The current model is not working with AMD MI250 with flash attention:

Concretly take a node of MI250 :
load with attn_implementation="flash_attention_2"

Capture d’écran 2024-03-28 à 12.17.38.png

If you load without flash attention this is working. Other MOe seems to be working (mixtral, grok) !

-e git+https://github.com/ROCmSoftwarePlatform/flash-attention.git@ae7928c5aed53cf6e75cc792baa9126b2abfcf1a#egg=flash_attn

Congrats again hopefully will be working on @amd soon :)
Pierre

Btw no problem on A100 :)

Databricks org

Does your setup work for other models with flash attention (e.g. llama)? What is the error you get?

Yes working both inference and training.

No errors just the model generates crap you can check the screenshot

Sign up or log in to comment