Memory Augmented Language Models through Mixture of Word Experts Paper • 2311.10768 • Published Nov 15, 2023 • 16
TinyGSM: achieving >80% on GSM8k with small language models Paper • 2312.09241 • Published Dec 14, 2023 • 37
Time is Encoded in the Weights of Finetuned Language Models Paper • 2312.13401 • Published Dec 20, 2023 • 20
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 23