This dataset is made available by Google Inc. under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. https://research.google.com/ava/download.html

AVA Speech Dataset

The AVA-Speech dataset annotates speech activity for the movie clips in the AVA v1.0 dataset. It explicitly labels 3 background noise conditions (Clean Speech, Speech with background Music, and Speech with background Noise), resulting in ~40K labeled segments spanning 40 hours of data. Please visit the project page for more details on the dataset.

This dataset contains audios and their Speech/Noise labels for 2000 samples per class.