Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs Paper âĸ 2411.02256 âĸ Published Nov 4, 2024 âĸ 1
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset Paper âĸ 2311.15308 âĸ Published Nov 26, 2023 âĸ 1
Sleeping 5 5 Gradio Demo Space creation helper V2 đļ Generate Gradio demo files for Hugging Face model repos
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper âĸ 2409.17146 âĸ Published Sep 25, 2024 âĸ 107
Running on CPU Upgrade 621 621 Open ASR Leaderboard đ Request evaluation results for a speech model
facebook/wav2vec2-base-960h Automatic Speech Recognition âĸ Updated Nov 14, 2022 âĸ 3.05M âĸ âĸ 317
Running on A10G 302 302 AudioLDM2 Text2Audio Text2Music Generation đ Generate a video waveform from text-based audio descriptions