SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 7 days ago • 157
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting Paper • 2504.05541 • Published 7 days ago • 14
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 4 days ago • 18
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 2 days ago • 58
MultiVENT and MAGMAR Resources Collection Resources associated with the MultiVENT datasets, MAGMAR workshop, and other video retrieval and multimodal retrieval augmented generation • 5 items • Updated 10 days ago • 1
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published Mar 6 • 23
LLaVA-OneVision Collection a model good at arbitrary types of visual input • 15 items • Updated Oct 5, 2024 • 24
Rank1: Test-Time Compute for Reranking in Information Retrieval Paper • 2502.18418 • Published Feb 25 • 26
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering Paper • 2502.13962 • Published Feb 19 • 28
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Paper • 2501.09019 • Published Jan 15 • 12
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Paper • 2501.07888 • Published Jan 14 • 15
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations Paper • 2412.13171 • Published Dec 17, 2024 • 35
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 12
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Paper • 2409.11136 • Published Sep 17, 2024 • 24
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling Paper • 2408.03695 • Published Aug 7, 2024 • 13
Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation Paper • 2308.07316 • Published Aug 14, 2023 • 7