Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs Paper âĸ 2411.02256 âĸ Published Nov 4, 2024 âĸ 1
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset Paper âĸ 2311.15308 âĸ Published Nov 26, 2023 âĸ 2
Sleeping 5 5 Gradio Demo Space creation helper V2 đļ Generate Gradio demo files for Hugging Face model repos
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper âĸ 2409.17146 âĸ Published Sep 25, 2024 âĸ 112
Running on CPU Upgrade 708 708 Open ASR Leaderboard đ Request and view assessments for speech recognition models
facebook/wav2vec2-base-960h Automatic Speech Recognition âĸ Updated Nov 14, 2022 âĸ 3.57M âĸ âĸ 327
Running on A10G 305 305 AudioLDM2 Text2Audio Text2Music Generation đ Generate audio and waveform video from text