L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? Paper • 2410.02115 • Published 6 days ago • 10
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens Paper • 2401.17377 • Published Jan 30 • 34
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis Paper • 2401.17093 • Published Jan 30 • 18
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers Paper • 2311.10642 • Published Nov 17, 2023 • 23
CodeFusion: A Pre-trained Diffusion Model for Code Generation Paper • 2310.17680 • Published Oct 26, 2023 • 69
Teaching Language Models to Self-Improve through Interactive Demonstrations Paper • 2310.13522 • Published Oct 20, 2023 • 11
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 19
Multimodal Foundation Models: From Specialists to General-Purpose Assistants Paper • 2309.10020 • Published Sep 18, 2023 • 40
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch Paper • 2309.10706 • Published Sep 19, 2023 • 16
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models Paper • 2309.09506 • Published Sep 18, 2023 • 14