Enables to toggle fused_dense, flash_rotary and attn_pdrop in the configuration. 45f4b21 gugarosa commited on Nov 1, 2023
Adds support for flash-attn rotary embedding and fused dense layers. 0bbd68a gugarosa commited on Nov 1, 2023
Adds support for MQA/GQA and attention mask during training. de35f90 gugarosa commited on Oct 30, 2023
Adding _set_gradient_checkpointing for compatibility (#22) 8091327 gugarosa vriveras commited on Oct 17, 2023
Add more precise license metadata (UI will be cleaner!) (#35) 8ab0f29 gugarosa julien-c HF staff commited on Sep 27, 2023
fix(phi-1_5): Checks length of `attention_mask`if it is passed as direct tensor. f9f2ac7 gugarosa commited on Sep 26, 2023