Sagemaker endpoint

#1
by MahmoudBL - opened

thanks for the great work I see, but how to deploy that mode to an aws Sagemaker endpoint then with disabling the flash_attention_v2 ?

@MahmoudBL I didn't try deployments on aws sagemaker. What is the problem you are facing ?

Hello Mohammed thanks for your response.
I can deploy it properly but when I need to disable the Flash attention I cannot do that.

@MahmoudBL Flash attention 2 is optional actually. you can remove it and remove with it the torch_dtype flag and things should work fine.
Can you share with me the error you got ?

MohamedRashad changed discussion status to closed

Sign up or log in to comment