MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Paper • 2404.05726 • Published 22 days ago • 18
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. • 2 items • Updated Mar 30 • 36
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20 • 18
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 119
Bengali Regional Text to IPA Models Collection A collection of models for transcribing Bengali Regional Text to the International Phonetic Alphabets (IPA). • 3 items • Updated 24 days ago • 1