Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published 13 days ago • 59
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published 15 days ago • 102
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts Paper • 2310.05898 • Published Oct 9, 2023 • 1
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published 19 days ago • 54
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 22 days ago • 229
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 54
Gecko: Versatile Text Embeddings Distilled from Large Language Models Paper • 2403.20327 • Published Mar 29 • 41
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies Paper • 2404.08197 • Published Apr 12 • 26
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published Apr 11 • 36
Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation Paper • 2404.04256 • Published Apr 5 • 5
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 59
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance Paper • 2404.04125 • Published Apr 4 • 26
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 22
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 99
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection Paper • 2403.19888 • Published Mar 29 • 9
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Paper • 2403.19651 • Published Mar 28 • 20
NEFTune: Noisy Embeddings Improve Instruction Finetuning Paper • 2310.05914 • Published Oct 9, 2023 • 13
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning Paper • 2205.05638 • Published May 11, 2022 • 3
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series Paper • 2403.15360 • Published Mar 22 • 11
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models Paper • 2403.13447 • Published Mar 20 • 16
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Paper • 2403.14520 • Published Mar 21 • 31
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information Paper • 2402.13616 • Published Feb 21 • 44
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 54
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM Paper • 2403.07487 • Published Mar 12 • 11
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 42
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14 • 11
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 119
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba Paper • 2403.09977 • Published Mar 15 • 7
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models Paper • 2402.15021 • Published Feb 22 • 11
TinyLLaVA: A Framework of Small-scale Large Multimodal Models Paper • 2402.14289 • Published Feb 22 • 16
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 35
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Paper • 2312.16862 • Published Dec 28, 2023 • 28
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model Paper • 2401.02330 • Published Jan 4 • 11
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model Paper • 2311.05348 • Published Nov 9, 2023 • 10
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper • 2311.05437 • Published Nov 9, 2023 • 40
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding Paper • 2310.15308 • Published Oct 23, 2023 • 22
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams Paper • 2310.08678 • Published Oct 12, 2023 • 11
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model Paper • 2309.11568 • Published Sep 20, 2023 • 9
SlimPajama-DC: Understanding Data Combinations for LLM Training Paper • 2309.10818 • Published Sep 19, 2023 • 10
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 84
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts Paper • 2309.04354 • Published Sep 8, 2023 • 13
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP Paper • 2210.04150 • Published Oct 9, 2022 • 1