Using flash attention option

#6
by lentan - opened

config.json seems to say it's using torch attention, but switching it to flash attention says it's unimplemented with alibi.

Edit: sorry just use triton, it's in the readme!

lentan changed discussion status to closed
Mosaic ML, Inc. org

You beat me to it :)

Sign up or log in to comment