speechbrain is an open-source and all-in-one conversational toolkit for audio/speech. The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognition, speech enhancement, speech separation, language identification, multi-microphone signal processing, and many others.
You can find
speechbrain models by filtering at the left of the models page.
All models on the Hub come up with the following features:
- An automatically generated model card with a brief description.
- Metadata tags that help for discoverability with information such as the language, license, paper, and more.
- An interactive widget you can use to play out with the model directly in the browser.
- An Inference API that allows to make inference requests.
speechbrain offers different interfaces to manage pretrained models for different tasks, such as
SpectralMaskEnhancement. These classes have a
from_hparams method you can use to load a model from the Hub
Here is an example to run inference for sound recognition in urban sounds.
import torchaudio from speechbrain.pretrained import EncoderClassifier classifier = EncoderClassifier.from_hparams( source="speechbrain/urbansound8k_ecapa" ) out_prob, score, index, text_lab = classifier.classify_file('speechbrain/urbansound8k_ecapa/dog_bark.wav')
If you want to see how to load a specific model, you can click
Use in speechbrain and you will be given a working snippet that you can load it!