Adds support for MQA/GQA and attention mask during training / fine-tuning. 371fd51 gugarosa commited on Oct 30, 2023