SnapKV: LLM Knows What You are Looking for Before Generation Paper • 2404.14469 • Published 28 days ago • 23
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published 28 days ago • 120
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 41
Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper • 2403.08763 • Published Mar 13 • 48
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15 • 28
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 70
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11 • 52
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 87
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting Paper • 2402.07207 • Published Feb 11 • 7
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31 • 55
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Paper • 2401.05566 • Published Jan 10 • 23
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models Paper • 2401.06951 • Published Jan 13 • 23
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference Paper • 2401.08671 • Published Jan 9 • 12
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning Paper • 2312.14878 • Published Dec 22, 2023 • 11
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Paper • 2401.00448 • Published Dec 31, 2023 • 25
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning Paper • 2311.11501 • Published Nov 20, 2023 • 32
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 38
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster Paper • 2311.08263 • Published Nov 14, 2023 • 14
S-LoRA: Serving Thousands of Concurrent LoRA Adapters Paper • 2311.03285 • Published Nov 6, 2023 • 27
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 30
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models Paper • 2310.16795 • Published Oct 25, 2023 • 26
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning Paper • 2310.09478 • Published Oct 14, 2023 • 15
LongNet: Scaling Transformers to 1,000,000,000 Tokens Paper • 2307.02486 • Published Jul 5, 2023 • 79
PaLI-3 Vision Language Models: Smaller, Faster, Stronger Paper • 2310.09199 • Published Oct 13, 2023 • 21
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 16
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 82
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis Paper • 2310.00426 • Published Sep 30, 2023 • 60
Large Language Models Cannot Self-Correct Reasoning Yet Paper • 2310.01798 • Published Oct 3, 2023 • 30
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning Paper • 2310.03094 • Published Oct 4, 2023 • 12