Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 19
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval Paper • 2309.15129 • Published Sep 25, 2023 • 6
The Consensus Game: Language Model Generation via Equilibrium Search Paper • 2310.09139 • Published Oct 13, 2023 • 12
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise Paper • 2212.11685 • Published Dec 22, 2022 • 2
Levels of AGI: Operationalizing Progress on the Path to AGI Paper • 2311.02462 • Published Nov 4, 2023 • 33
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 603
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Paper • 2407.01392 • Published Jul 1 • 39
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Paper • 2410.02707 • Published Oct 3 • 47