winglian's picture
add flash attn context for efficient training and attempt setting model to train mode:
8792199