Adds support for flash-attn rotary embedding and fused dense layers. 90c38d9 gugarosa commited on Nov 1, 2023
Adds support for MQA/GQA and attention mask during training / fine-tuning. 371fd51 gugarosa commited on Oct 30, 2023
fix(phi-1): Checks length of `attention_mask`if it is passed as direct tensor. 1f890f7 gugarosa commited on Sep 26, 2023