LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 12 days ago • 102
view article Article LLaVA-o1: Let Vision Language Models Reason Step-by-Step By mikelabs • 9 days ago • 7
Medical QA Datasets Collection A collection of medical question answering (QA) datasets • 20 items • Updated 3 days ago • 24
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning Paper • 2410.21845 • Published 30 days ago • 11
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 74
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper • 2408.13233 • Published Aug 23 • 21
Berkeley Humanoid: A Research Platform for Learning-based Control Paper • 2407.21781 • Published Jul 31 • 8
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 75
Longhorn: State Space Models are Amortized Online Learners Paper • 2407.14207 • Published Jul 19 • 17
E5-V: Universal Embeddings with Multimodal Large Language Models Paper • 2407.12580 • Published Jul 17 • 39
CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization Paper • 2407.10424 • Published Jul 15 • 6
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10 • 40