Article
Accelerating Language Model Inference with Mixture of Attentions
By
•
•
24Brilliant stuff. Personally I would love to see the discretization and latest A initialization digested. Looking forward to the upcoming posts!
2.8b model is that the Pile or SlimPJ trained one?