Submitted by LXT 46 OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding · 8 authors 2
Submitted by xinlai 32 Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs · 6 authors 2
Submitted by multimodalart 29 MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data · 2 authors 2
Submitted by TranSirius 25 SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation · 8 authors 1
Submitted by TranSirius 21 Aligning Teacher with Student Preferences for Tailored Training Data Generation · 6 authors 2
Submitted by Foxfi 12 MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression · 13 authors 4
Submitted by xw-eric 9 Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding · 9 authors 1
Submitted by mbrack 7 T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings · 5 authors 1
Submitted by akhaliq 5 AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models · 12 authors 4
Submitted by dongguanting 5 Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation · 6 authors 3
Submitted by ahmedheakl 4 ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs · 5 authors 2
Submitted by ahmedheakl 2 ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models · 5 authors 1