An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11, 2024 • 57
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published Jun 11, 2024 • 34
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published Jun 12, 2024 • 40
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12, 2024 • 25
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination Paper • 2406.05132 • Published Jun 7, 2024 • 28
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published Jun 10, 2024 • 24
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published Jun 10, 2024 • 37