Audio Course documentation

Check your understanding of the course material

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Check your understanding of the course material

1. What units is the sampling rate measured in?

2. When streaming a large audio dataset, how soon can you start using it?

3. What is a spectrogram?

4. What is the easiest way to convert raw audio data into log-mel spectrogram expected by Whisper?

A.

librosa.feature.melspectrogram(audio["array"])

B.

feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-small")
feature_extractor(audio["array"])

C.

dataset.feature(audio["array"], model="whisper")

5. How do you load a dataset from 🤗 Hub?

A.

from datasets import load_dataset

dataset = load_dataset(DATASET_NAME_ON_HUB)

B.

import librosa

dataset = librosa.load(PATH_TO_DATASET)

C.

from transformers import load_dataset

dataset = load_dataset(DATASET_NAME_ON_HUB)

6. Your custom dataset contains high-quality audio with 32 kHz sampling rate. You want to train a speech recognition model that expects the audio examples to have a 16 kHz sampling rate. What should you do?

7. How can you convert a spectrogram generated by a machine learning model into a waveform?