Mobius: Text to Seamless Looping Video Generation via Latent Shift Paper • 2502.20307 • Published 12 days ago • 17
Self-rewarding correction for mathematical reasoning Paper • 2502.19613 • Published 13 days ago • 76
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration Paper • 2412.04440 • Published Dec 5, 2024 • 20
GRAPE: Generalizing Robot Policy via Preference Alignment Paper • 2411.19309 • Published Nov 28, 2024 • 44
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14, 2024 • 52
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published Oct 2, 2024 • 14
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases Paper • 2407.12784 • Published Jul 17, 2024 • 49
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published Jul 8, 2024 • 37
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models Paper • 2407.05131 • Published Jul 6, 2024 • 27
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5, 2024 • 55
VideoLLaMA2 Collection Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability • 13 items • Updated about 9 hours ago • 20
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition Paper • 2402.11452 • Published Feb 18, 2024 • 1
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding Paper • 2403.00425 • Published Mar 1, 2024 • 1