Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 24 days ago • 136
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Paper • 2411.02337 • Published Nov 4, 2024 • 35
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 98
FiT: Flexible Vision Transformer for Diffusion Model Paper • 2402.12376 • Published Feb 19, 2024 • 48
Mixtures of Experts Unlock Parameter Scaling for Deep RL Paper • 2402.08609 • Published Feb 13, 2024 • 34
BlackMamba: Mixture of Experts for State-Space Models Paper • 2402.01771 • Published Feb 1, 2024 • 23
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning Paper • 2401.16013 • Published Jan 29, 2024 • 23
Proactive Detection of Voice Cloning with Localized Watermarking Paper • 2401.17264 • Published Jan 30, 2024 • 17
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26, 2024 • 35