-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 21 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 9 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 33 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2405.12107
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 75 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 51 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 29 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 55
-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 59 -
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper • 2406.06469 • Published • 22 -
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper • 2406.04271 • Published • 25 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper • 2406.02657 • Published • 34
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 27 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 18 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 15
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 572 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 94 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 42
-
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Paper • 2401.16158 • Published • 16 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 81 -
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper • 2405.12107 • Published • 23 -
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Paper • 2406.01014 • Published • 29
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 10 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 47 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 41