Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs Paper âĸ 2411.02256 âĸ Published Nov 4, 2024 âĸ 1
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset Paper âĸ 2311.15308 âĸ Published Nov 26, 2023 âĸ 1
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper âĸ 2409.17146 âĸ Published Sep 25, 2024 âĸ 104
Running on CPU Upgrade 8.98k đŠâđ¨ AI Comic Factory Create your own AI comic with a single prompt