TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published 6 days ago • 40
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 26 days ago • 319
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published Jan 9 • 15
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published Dec 12, 2024 • 11
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published Dec 12, 2024 • 11
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension Paper • 2412.03704 • Published Dec 4, 2024 • 7
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation Paper • 2410.23277 • Published Oct 30, 2024 • 9
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation Paper • 2410.23277 • Published Oct 30, 2024 • 9
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper • 2408.00765 • Published Aug 1, 2024 • 13
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper • 2408.00765 • Published Aug 1, 2024 • 13
VideoGUI: A Benchmark for GUI Automation from Instructional Videos Paper • 2406.10227 • Published Jun 14, 2024 • 9