GRAPE: Generalizing Robot Policy via Preference Alignment Paper • 2411.19309 • Published 5 days ago • 29
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14 • 50
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published Oct 2 • 13
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases Paper • 2407.12784 • Published Jul 17 • 48
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published Jul 8 • 34
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models Paper • 2407.05131 • Published Jul 6 • 24
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5 • 52
VideoLLaMA 2 Collection Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability • 13 items • Updated 20 days ago • 21
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition Paper • 2402.11452 • Published Feb 18 • 1
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding Paper • 2403.00425 • Published Mar 1 • 1