changed use_flash_attention_2=True to attn_implementation="flash_attention_2"

#53
by macadeliccc - opened

I receive this warning when using use_flash_attention_2=True

The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.

Using attn_implementation="flash_attention_2" alleviates the warning message

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, attn_implementation="flash_attention_2"
)

That will only work for the people who use the latest release. Let's keep it that way for now.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment