Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs Paper โข 2411.02256 โข Published Nov 4, 2024 โข 1
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset Paper โข 2311.15308 โข Published Nov 26, 2023 โข 1
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper โข 2409.17146 โข Published Sep 25, 2024 โข 104