SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published about 23 hours ago • 70
AI for Disability Collection A collection of datasets, models, spaces and papers that uses AI to address a disability-related topic. • 3 items • Updated Dec 29, 2024 • 3
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated Feb 20 • 251
Multimodal Chaptering for Long-Form TV Newscast Video Paper • 2406.17590 • Published Mar 20, 2024 • 2
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 129
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos Paper • 2407.12679 • Published Jul 17, 2024 • 8
Towards Retrieval Augmented Generation over Large Video Libraries Paper • 2406.14938 • Published Jun 21, 2024 • 21
Inserting Faces inside Captions: Image Captioning with Attention Guided Merging Paper • 2405.02305 • Published Mar 20, 2024 • 2