Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published 16 days ago • 55
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published 24 days ago • 55
Jamba: A Hybrid Transformer-Mamba Language Model Paper • 2403.19887 • Published about 1 month ago • 96
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline Paper • 2404.02893 • Published 25 days ago • 19
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression Paper • 2403.12968 • Published Mar 19 • 20
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 53
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 37
Common 7B Language Models Already Possess Strong Math Capabilities Paper • 2403.04706 • Published Mar 7 • 15
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 170
A Critical Evaluation of AI Feedback for Aligning Large Language Models Paper • 2402.12366 • Published Feb 19 • 3
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 560
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 87
Orca-Math: Unlocking the potential of SLMs in Grade School Math Paper • 2402.14830 • Published Feb 16 • 23
GPTVQ: The Blessing of Dimensionality for LLM Quantization Paper • 2402.15319 • Published Feb 23 • 19
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning Paper • 2402.04833 • Published Feb 7 • 6
Self-Discover: Large Language Models Self-Compose Reasoning Structures Paper • 2402.03620 • Published Feb 6 • 102
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 45
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Paper • 2312.06585 • Published Dec 11, 2023 • 26
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses Paper • 2312.00763 • Published Dec 1, 2023 • 18
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 38
YUAN 2.0: A Large Language Model with Localized Filtering-based Attention Paper • 2311.15786 • Published Nov 27, 2023 • 7
Memory Augmented Language Models through Mixture of Word Experts Paper • 2311.10768 • Published Nov 15, 2023 • 16
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers Paper • 2311.10642 • Published Nov 17, 2023 • 23
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery Paper • 2310.18356 • Published Oct 24, 2023 • 22
A Simple and Effective Pruning Approach for Large Language Models Paper • 2306.11695 • Published Jun 20, 2023 • 2
NEFTune: Noisy Embeddings Improve Instruction Finetuning Paper • 2310.05914 • Published Oct 9, 2023 • 13
Sparse Finetuning for Inference Acceleration of Large Language Models Paper • 2310.06927 • Published Oct 10, 2023 • 14
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 30
The Impact of Depth and Width on Transformer Language Model Generalization Paper • 2310.19956 • Published Oct 30, 2023 • 8
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks Paper • 2309.17410 • Published Sep 29, 2023 • 4
Representation Engineering: A Top-Down Approach to AI Transparency Paper • 2310.01405 • Published Oct 2, 2023 • 4
Efficient Streaming Language Models with Attention Sinks Paper • 2309.17453 • Published Sep 29, 2023 • 13
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation Paper • 2309.00398 • Published Sep 1, 2023 • 18