You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper
•
2405.05254
•
Published
•
10
Additional paper with faq, code and tips on: https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf