mpc001/auto_avsr · Apply for community grant: Academic project

I am writing to apply for a grant to support our academic project "Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels".

Audio-visual speech recognition (AVSR) is the task of transcribing text from audio and visual streams, which has recently attracted a lot of research attention due to its robustness against noise. Since the visual stream is not affected by the presence of noise, an audio-visual model can lead to improved performance over an audio-only model as the level of noise increases.

For real-world practical applications, the AVSR model is desirable due to its robustness against acoustic noise. The models are expected to have a considerable impact on speech research. Our project aims to offer an accessible and user-friendly solution for transcribing speech from audio, video, and audio-visual streams. This will significantly enhance the capabilities of researchers and developers in various fields, making speech recognition technology accessible to everyone.

Although the toolbox is functional on CPU-based systems, its performance is significantly hindered by slow processing times. To unlock its full potential and ensure a seamless user experience, the integration of GPU technology is crucial. With the power of a GPU, the toolbox can perform at optimal efficiency, providing real-time results to users.

Thank you for considering our application. We appreciate your time and attention.