Supplemental reading and resources
本单元介绍了语音识别的实践,语音识别是音频领域最受欢迎的任务之一。 想了解更多吗?在这里,您可以找到更多资源,加深您对这个主题的理解,提升学习体验。
- Whisper Talk by Jong Wook Kim: a presentation on the Whisper model, explaining the motivation, architecture, training and results, delivered by Whisper author Jong Wook Kim
- End-to-End Speech Benchmark (ESB): a paper that comprehensively argues for using the orthographic WER as opposed to the normalised WER for evaluating ASR systems and presents an accompanying benchmark
- Fine-Tuning Whisper for Multilingual ASR: an in-depth blog post that explains how the Whisper model works in more detail, and the pre- and post-processing steps involved with the feature extractor and tokenizer
- Fine-tuning MMS Adapter Models for Multi-Lingual ASR: an end-to-end guide for fine-tuning Meta AI’s new MMS speech recognition models, freezing the base model weights and only fine-tuning a small number of adapter layers
- Boosting Wav2Vec2 with n-grams in 🤗 Transformers: a blog post for combining CTC models with external language models (LMs) to combat spelling and punctuation errors