-
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 72 -
BRAVE: Broadening the visual encoding of vision-language models
Paper • 2404.07204 • Published • 18 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 45 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 104
Collections
Discover the best community collections!
Collections including paper arxiv:2409.01704
-
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 138 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 88 -
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Paper • 2409.02634 • Published • 90 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 108
-
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Paper • 2408.15998 • Published • 84 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 83 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 63 -
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper • 2405.06682 • Published • 3