matlok 's Collections
LMM

Papers - Attention - Mixture of Attention Heads (MoA)

Generalized multi head using RoPE