The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published 29 days ago • 184
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published Jan 3 • 42
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published Dec 24, 2024 • 75
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published Jan 22 • 57
EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion Paper • 2501.13452 • Published Jan 23 • 7
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published Jan 18 • 15
GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing Paper • 2501.13925 • Published Jan 23 • 7
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation Paper • 2403.14614 • Published Mar 21, 2024 • 4
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques Paper • 2501.14492 • Published Jan 24 • 31
Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models Paper • 2501.04828 • Published Jan 8 • 11
Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering Paper • 2408.07888 • Published Aug 15, 2024 • 13