SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 55
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 40
Cached Transformers: Improving Transformers with Differentiable Memory Cache Paper • 2312.12742 • Published Dec 20, 2023 • 11
Mini-GPTs: Efficient Large Language Models through Contextual Pruning Paper • 2312.12682 • Published Dec 20, 2023 • 7
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 253
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 39
Distributed Inference and Fine-tuning of Large Language Models Over The Internet Paper • 2312.08361 • Published Dec 13, 2023 • 23
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 131
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering Paper • 2311.12775 • Published Nov 21, 2023 • 28
Orca 2: Teaching Small Language Models How to Reason Paper • 2311.11045 • Published Nov 18, 2023 • 69
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models Paper • 2308.14352 • Published Aug 28, 2023
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Paper • 2312.16862 • Published Dec 28, 2023 • 28
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications Paper • 2312.16145 • Published Dec 26, 2023 • 8
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math Paper • 2312.17120 • Published Dec 28, 2023 • 24
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 24
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 68
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data Paper • 2309.11235 • Published Sep 20, 2023 • 15
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 235
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 566
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 43
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting Paper • 2403.08551 • Published Mar 13 • 8
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 101
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 28
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 92
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Paper • 2402.19427 • Published Feb 29 • 49
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published about 1 month ago • 235