Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23 • 25
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published Jan 2 • 26
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models Paper • 2501.05767 • Published Jan 10 • 29
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28 • 28
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published Jan 16 • 28
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published Dec 31, 2024 • 29
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published Jan 11 • 31
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published Jan 14 • 34
An Empirical Study of Autoregressive Pre-training from Videos Paper • 2501.05453 • Published Jan 9 • 41
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models Paper • 2501.02955 • Published Jan 6 • 44
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published Jan 3 • 45
PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper • 2501.10120 • Published Jan 17 • 48
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Paper • 2501.09751 • Published Jan 16 • 48
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 47
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published Jan 2 • 54
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Paper • 2501.04686 • Published Jan 8 • 52
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7 • 52