Submitted by jinjieni 75 MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures · 13 authors 2
Submitted by tyl5566 37 Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens · 9 authors 3
Submitted by FanBuCUHK 33 Roadmap towards Superhuman Speech Understanding using Large Language Models · 6 authors 2
Submitted by JamesZhutheThird 32 MobA: A Two-Level Agent System for Efficient Mobile Task Automation · 11 authors 3
Submitted by WuChengyue 32 Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation · 11 authors 4
Submitted by gentaiscool 30 WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines · 51 authors 3
Submitted by weilllllls 24 DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control · 12 authors 2
Submitted by richardxp888 21 MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models · 9 authors 3
Submitted by zhoutianyi 20 BenTo: Benchmark Task Reduction with In-Context Transferability · 4 authors 3
Submitted by ZenMoore 19 PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment · 8 authors 2
Submitted by SiweiWu 17 A Comparative Study on Reasoning Patterns of OpenAI's o1 Model · 17 authors 2
Submitted by Tigerph 16 A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models · 8 authors 2
Submitted by hbseong 12 Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems · 2 authors 2
Submitted by akhaliq 12 VidPanos: Generative Panoramic Videos from Casual Panning Videos · 9 authors 2
Submitted by MING-ZCH 10 Can MLLMs Understand the Deep Implication Behind Chinese Images? · 21 authors 2
Submitted by Sreyan88 9 Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation · 7 authors 2
Submitted by Hoar012 8 Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant · 5 authors 2
Submitted by KrithikV 8 MedMobile: A mobile-sized language model with expert-level clinical capabilities · 5 authors 2
Submitted by YaxinLuo 7 $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models · 7 authors 2
Submitted by ckzheng 7 MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization · 6 authors 2
Submitted by mshuaibi 6 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models · 9 authors 1
Submitted by Shiym 6 LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning · 7 authors 2
Submitted by Yingda 5 Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key · 6 authors 2
Submitted by arthurhero 5 Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats · 8 authors 2
Submitted by ChenDRAG 4 Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment · 4 authors 2
Submitted by markywg 3 TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration · 5 authors 2
Submitted by pdx97 2 SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation · 2 authors 2