MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 68
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5 • 38
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper • 2401.02038 • Published Jan 4 • 60
LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4 • 35
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model Paper • 2401.02330 • Published Jan 4 • 11
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 176
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2 • 61
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 25
A Comprehensive Study of Knowledge Editing for Large Language Models Paper • 2401.01286 • Published Jan 2 • 15
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 77
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 36
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Paper • 2312.16862 • Published Dec 28, 2023 • 28
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math Paper • 2312.17120 • Published Dec 28, 2023 • 25
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices Paper • 2312.16886 • Published Dec 28, 2023 • 19
Human101: Training 100+FPS Human Gaussians in 100s from 1 View Paper • 2312.15258 • Published Dec 23, 2023 • 6
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation Paper • 2312.14187 • Published Dec 20, 2023 • 49
Reasons to Reject? Aligning Language Models with Judgments Paper • 2312.14591 • Published Dec 22, 2023 • 16
Time is Encoded in the Weights of Finetuned Language Models Paper • 2312.13401 • Published Dec 20, 2023 • 18
TinySAM: Pushing the Envelope for Efficient Segment Anything Model Paper • 2312.13789 • Published Dec 21, 2023 • 13
TinyGSM: achieving >80% on GSM8k with small language models Paper • 2312.09241 • Published Dec 14, 2023 • 34
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 39
PromptBench: A Unified Library for Evaluation of Large Language Models Paper • 2312.07910 • Published Dec 13, 2023 • 14
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations Paper • 2312.06674 • Published Dec 7, 2023 • 5
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Paper • 2312.06585 • Published Dec 11, 2023 • 26
Evaluation of Large Language Models for Decision Making in Autonomous Driving Paper • 2312.06351 • Published Dec 11, 2023 • 5
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Paper • 2312.03818 • Published Dec 6, 2023 • 31
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Paper • 2312.04461 • Published Dec 7, 2023 • 50
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator Paper • 2312.04474 • Published Dec 7, 2023 • 28
Pearl: A Production-ready Reinforcement Learning Agent Paper • 2312.03814 • Published Dec 6, 2023 • 14
OneLLM: One Framework to Align All Modalities with Language Paper • 2312.03700 • Published Dec 6, 2023 • 20
LivePhoto: Real Image Animation with Text-guided Motion Control Paper • 2312.02928 • Published Dec 5, 2023 • 15
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models Paper • 2312.02969 • Published Dec 5, 2023 • 12
Training Chain-of-Thought via Latent-Variable Inference Paper • 2312.02179 • Published Nov 28, 2023 • 8
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 132
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs Paper • 2311.13600 • Published Nov 22, 2023 • 41
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Paper • 2311.13231 • Published Nov 22, 2023 • 25
Orca 2: Teaching Small Language Models How to Reason Paper • 2311.11045 • Published Nov 18, 2023 • 69
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems Paper • 2311.11315 • Published Nov 19, 2023 • 6
ToolTalk: Evaluating Tool-Usage in a Conversational Setting Paper • 2311.10775 • Published Nov 15, 2023 • 7
ProAgent: From Robotic Process Automation to Agentic Process Automation Paper • 2311.10751 • Published Nov 2, 2023 • 8
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 Paper • 2311.10702 • Published Nov 17, 2023 • 17
SelfEval: Leveraging the discriminative nature of generative models for evaluation Paper • 2311.10708 • Published Nov 17, 2023 • 14
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models Paper • 2311.10093 • Published Nov 16, 2023 • 55
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks Paper • 2311.09835 • Published Nov 16, 2023 • 7
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models Paper • 2311.08692 • Published Nov 15, 2023 • 12
Instruction-Following Evaluation for Large Language Models Paper • 2311.07911 • Published Nov 14, 2023 • 17
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster Paper • 2311.08263 • Published Nov 14, 2023 • 14
ChatAnything: Facetime Chat with LLM-Enhanced Personas Paper • 2311.06772 • Published Nov 12, 2023 • 33
LayoutPrompter: Awaken the Design Ability of Large Language Models Paper • 2311.06495 • Published Nov 11, 2023 • 9
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs Paper • 2311.05657 • Published Nov 9, 2023 • 26
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 76
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper • 2311.05437 • Published Nov 9, 2023 • 40
Prompt Cache: Modular Attention Reuse for Low-Latency Inference Paper • 2311.04934 • Published Nov 7, 2023 • 23
Levels of AGI: Operationalizing Progress on the Path to AGI Paper • 2311.02462 • Published Nov 4, 2023 • 31
S-LoRA: Serving Thousands of Concurrent LoRA Adapters Paper • 2311.03285 • Published Nov 6, 2023 • 27
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning Paper • 2311.02103 • Published Nov 1, 2023 • 15
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning Paper • 2311.02303 • Published Nov 4, 2023 • 4
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding Paper • 2311.03354 • Published Nov 6, 2023 • 4
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation Paper • 2311.00272 • Published Nov 1, 2023 • 8
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B Paper • 2310.20624 • Published Oct 31, 2023 • 12
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery Paper • 2310.18356 • Published Oct 24, 2023 • 22
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise Paper • 2310.19019 • Published Oct 29, 2023 • 9
JudgeLM: Fine-tuned Large Language Models are Scalable Judges Paper • 2310.17631 • Published Oct 26, 2023 • 31
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models Paper • 2310.16795 • Published Oct 25, 2023 • 26
InstructExcel: A Benchmark for Natural Language Instruction in Excel Paper • 2310.14495 • Published Oct 23, 2023 • 1
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models Paper • 2310.13127 • Published Oct 19, 2023 • 10
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search Paper • 2310.13227 • Published Oct 20, 2023 • 11
Tuna: Instruction Tuning using Feedback from Large Language Models Paper • 2310.13385 • Published Oct 20, 2023 • 8
AgentTuning: Enabling Generalized Agent Abilities for LLMs Paper • 2310.12823 • Published Oct 19, 2023 • 33
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Paper • 2310.11511 • Published Oct 17, 2023 • 65
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models Paper • 2310.08659 • Published Oct 12, 2023 • 20
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models Paper • 2310.08491 • Published Oct 12, 2023 • 51
Lemur: Harmonizing Natural Language and Code for Language Agents Paper • 2310.06830 • Published Oct 10, 2023 • 29
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Paper • 2310.03731 • Published Oct 5, 2023 • 26
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines Paper • 2310.03714 • Published Oct 5, 2023 • 28
SmartPlay : A Benchmark for LLMs as Intelligent Agents Paper • 2310.01557 • Published Oct 2, 2023 • 12
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19 • 50
Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation Paper • 2401.10838 • Published Jan 19 • 8
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Paper • 2401.11708 • Published Jan 22 • 28
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment Paper • 2401.12474 • Published Jan 23 • 33
Small Language Model Meets with Reinforced Vision Vocabulary Paper • 2401.12503 • Published Jan 23 • 31
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models Paper • 2401.12522 • Published Jan 23 • 11
MM-LLMs: Recent Advances in MultiModal Large Language Models Paper • 2401.13601 • Published Jan 24 • 41
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All Paper • 2401.13795 • Published Jan 24 • 64
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence Paper • 2401.14196 • Published Jan 25 • 45
Genie: Achieving Human Parity in Content-Grounded Datasets Generation Paper • 2401.14367 • Published Jan 25 • 6
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 46
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 47
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception Paper • 2401.16158 • Published Jan 29 • 16
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning Paper • 2401.16013 • Published Jan 29 • 19
SymbolicAI: A framework for logic-based approaches combining generative models and solvers Paper • 2402.00854 • Published Feb 1 • 18
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Paper • 2402.07456 • Published Feb 12 • 39
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Paper • 2402.07033 • Published Feb 10 • 16
AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts Paper • 2402.07625 • Published Feb 12 • 10
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts Paper • 2402.09727 • Published Feb 15 • 35
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization Paper • 2402.09812 • Published Feb 15 • 11
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 33
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows Paper • 2402.10379 • Published Feb 16 • 27
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models Paper • 2402.10524 • Published Feb 16 • 20
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling Paper • 2402.10466 • Published Feb 16 • 16
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing Paper • 2402.10294 • Published Feb 15 • 20
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement Paper • 2402.14658 • Published Feb 22 • 78
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping Paper • 2402.14083 • Published Feb 21 • 43
TinyLLaVA: A Framework of Small-scale Large Multimodal Models Paper • 2402.14289 • Published Feb 22 • 17
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming Paper • 2402.14261 • Published Feb 22 • 10
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 106
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting Paper • 2402.13720 • Published Feb 21 • 4
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31 • 55
Specialized Language Models with Cheap Inference from Limited Domain Data Paper • 2402.01093 • Published Feb 2 • 45
TravelPlanner: A Benchmark for Real-World Planning with Language Agents Paper • 2402.01622 • Published Feb 2 • 31
Nomic Embed: Training a Reproducible Long Context Text Embedder Paper • 2402.01613 • Published Feb 2 • 13
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 66
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing Paper • 2402.02583 • Published Feb 4 • 7
Self-Discover: Large Language Models Self-Compose Reasoning Structures Paper • 2402.03620 • Published Feb 6 • 106
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Paper • 2402.04248 • Published Feb 6 • 25
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Paper • 2402.03766 • Published Feb 6 • 9
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models Paper • 2402.03749 • Published Feb 6 • 9
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs Paper • 2402.04291 • Published Feb 6 • 48
ScreenAI: A Vision-Language Model for UI and Infographics Understanding Paper • 2402.04615 • Published Feb 7 • 33
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text Paper • 2402.04379 • Published Feb 6 • 7
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay Paper • 2402.04858 • Published Feb 7 • 14
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains Paper • 2402.05140 • Published Feb 6 • 18
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Paper • 2402.05930 • Published Feb 8 • 35
Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model Paper • 2310.08072 • Published Oct 12, 2023 • 1
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20 • 46
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models Paper • 2402.10986 • Published Feb 16 • 74
Speculative Streaming: Fast LLM Inference without Auxiliary Models Paper • 2402.11131 • Published Feb 16 • 41
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Paper • 2402.12226 • Published Feb 19 • 37
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning Paper • 2312.15685 • Published Dec 25, 2023 • 16
SelectLLM: Can LLMs Select Important Instructions to Annotate? Paper • 2401.16553 • Published Jan 29 • 3
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning Paper • 2402.04833 • Published Feb 7 • 6
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications Paper • 2402.07927 • Published Feb 5 • 1
Simple linear attention language models balance the recall-throughput tradeoff Paper • 2402.18668 • Published Feb 28 • 18
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model Paper • 2402.17412 • Published Feb 27 • 21
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Paper • 2402.17553 • Published Feb 27 • 21
Training-Free Long-Context Scaling of Large Language Models Paper • 2402.17463 • Published Feb 27 • 19
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding Paper • 2402.16671 • Published Feb 26 • 26
Seamless Human Motion Composition with Blended Positional Encodings Paper • 2402.15509 • Published Feb 23 • 13
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 92
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters Paper • 2403.02677 • Published Mar 5 • 16
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets Paper • 2403.03194 • Published Mar 5 • 11
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 73
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 37
Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper • 2403.08763 • Published Mar 13 • 48
Adapting Large Language Models via Reading Comprehension Paper • 2309.09530 • Published Sep 18, 2023 • 73
Recurrent Drafter for Fast Speculative Decoding in Large Language Models Paper • 2403.09919 • Published Mar 14 • 19
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding Paper • 2403.12895 • Published Mar 19 • 28
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression Paper • 2403.12968 • Published Mar 19 • 23
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text Paper • 2403.18421 • Published Mar 27 • 21
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published 23 days ago • 35
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published 23 days ago • 21
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published 1 day ago • 22
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published 6 days ago • 19