Retrieval Head Mechanistically Explains Long-Context Factuality Paper • 2404.15574 • Published Apr 24, 2024 • 3
Toward Inference-optimal Mixture-of-Expert Large Language Models Paper • 2404.02852 • Published Apr 3, 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29, 2024 • 29
Decomposed Prompting: A Modular Approach for Solving Complex Tasks Paper • 2210.02406 • Published Oct 5, 2022 • 1
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning Paper • 2309.05653 • Published Sep 11, 2023 • 10
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis Paper • 2305.13230 • Published May 22, 2023
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models Paper • 2305.08322 • Published May 15, 2023
Data-to-text Generation with Variational Sequential Planning Paper • 2202.13756 • Published Feb 28, 2022
Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE Paper • 2210.16407 • Published Oct 28, 2022
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15, 2024 • 26
Specializing Smaller Language Models towards Multi-Step Reasoning Paper • 2301.12726 • Published Jan 30, 2023 • 2
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance Paper • 2305.17306 • Published May 26, 2023 • 2
Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback Paper • 2305.10142 • Published May 17, 2023 • 1