Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 103
Retrieval Head Mechanistically Explains Long-Context Factuality Paper • 2404.15574 • Published Apr 24 • 2
Toward Inference-optimal Mixture-of-Expert Large Language Models Paper • 2404.02852 • Published Apr 3
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
Complexity-Based Prompting for Multi-Step Reasoning Paper • 2210.00720 • Published Oct 3, 2022 • 1
Decomposed Prompting: A Modular Approach for Solving Complex Tasks Paper • 2210.02406 • Published Oct 5, 2022
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning Paper • 2309.05653 • Published Sep 11, 2023 • 10
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis Paper • 2305.13230 • Published May 22, 2023
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models Paper • 2305.08322 • Published May 15, 2023
Data-to-text Generation with Variational Sequential Planning Paper • 2202.13756 • Published Feb 28, 2022
Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE Paper • 2210.16407 • Published Oct 28, 2022
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 23 • 7
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 23 • 7
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 23
Specializing Smaller Language Models towards Multi-Step Reasoning Paper • 2301.12726 • Published Jan 30, 2023 • 1
Specializing Smaller Language Models towards Multi-Step Reasoning Paper • 2301.12726 • Published Jan 30, 2023 • 1