Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 5 days ago • 34
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published 7 days ago • 24
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 4 days ago • 115
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published 5 days ago • 41
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 5 days ago • 86
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper • 2412.08443 • Published 6 days ago • 36
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 8 days ago • 62
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper • 2412.04862 • Published 12 days ago • 46
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment Paper • 2412.04814 • Published 12 days ago • 44
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published 11 days ago • 110
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper • 2412.04455 • Published 12 days ago • 35
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published 12 days ago • 103
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published 12 days ago • 52
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Paper • 2412.00493 • Published 17 days ago • 15
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 13 days ago • 116
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published 25 days ago • 42
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published 25 days ago • 55