Audio Course documentation

Supplemental reading and resources

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Supplemental reading and resources

本单元介绍了语音识别的实践,语音识别是音频领域最受欢迎的任务之一。 想了解更多吗?在这里,您可以找到更多资源,加深您对这个主题的理解,提升学习体验。

  • Whisper Talk by Jong Wook Kim: a presentation on the Whisper model, explaining the motivation, architecture, training and results, delivered by Whisper author Jong Wook Kim
  • End-to-End Speech Benchmark (ESB): a paper that comprehensively argues for using the orthographic WER as opposed to the normalised WER for evaluating ASR systems and presents an accompanying benchmark
  • Fine-Tuning Whisper for Multilingual ASR: an in-depth blog post that explains how the Whisper model works in more detail, and the pre- and post-processing steps involved with the feature extractor and tokenizer
  • Fine-tuning MMS Adapter Models for Multi-Lingual ASR: an end-to-end guide for fine-tuning Meta AI’s new MMS speech recognition models, freezing the base model weights and only fine-tuning a small number of adapter layers
  • Boosting Wav2Vec2 with n-grams in 🤗 Transformers: a blog post for combining CTC models with external language models (LMs) to combat spelling and punctuation errors