ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning Paper • 2503.22738 • Published 28 days ago • 16
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Paper • 2503.24388 • Published 23 days ago • 30
Mobius: Text to Seamless Looping Video Generation via Latent Shift Paper • 2502.20307 • Published Feb 27 • 19
Self-rewarding correction for mathematical reasoning Paper • 2502.19613 • Published Feb 26 • 84
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration Paper • 2412.04440 • Published Dec 5, 2024 • 21
GRAPE: Generalizing Robot Policy via Preference Alignment Paper • 2411.19309 • Published Nov 28, 2024 • 48
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14, 2024 • 53
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published Oct 2, 2024 • 14
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases Paper • 2407.12784 • Published Jul 17, 2024 • 52
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published Jul 8, 2024 • 38
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models Paper • 2407.05131 • Published Jul 6, 2024 • 28
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5, 2024 • 57
VideoLLaMA2 Collection Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability • 13 items • Updated Mar 11 • 19
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition Paper • 2402.11452 • Published Feb 18, 2024 • 1
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding Paper • 2403.00425 • Published Mar 1, 2024 • 1