Audio Course documentation

Welcome to the Hugging Face Audio course!

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Welcome to the Hugging Face Audio course!

Dear learner,

Welcome to this course on using transformers for audio. Time and again transformers have proven themselves as one of the most powerful and versatile deep learning architectures, capable of achieving state-of-the-art results in a wide range of tasks, including natural language processing, computer vision, and more recently, audio processing.

In this course, we will explore how transformers can be applied to audio data. You’ll learn how to use them to tackle a range of audio-related tasks. Whether you are interested in speech recognition, audio classification, or generating speech from text, transformers and this course have got you covered.

To give you a taste of what these models can do, say a few words in the demo below and watch the model transcribe it in real-time!

Throughout the course, you will gain an understanding of the specifics of working with audio data, you’ll learn about different transformer architectures, and you’ll train your own audio transformers leveraging powerful pre-trained models.

This course is designed for learners with a background in deep learning, and general familiarity with transformers. No expertise in audio data processing is required. If you need to brush up on your understanding of transformers, check out our NLP Course that goes into much detail on the transformer basics.

Meet the course team

Sanchit Gandhi, Machine Learning Research Engineer at Hugging Face

Hi! I’m Sanchit and I’m a machine learning research engineer for audio in the open-source team at Hugging Face 🤗. My primary focus is automatic speech recognition and translation, with the current goal of making speech models faster, lighter and easier to use.

Matthijs Hollemans, Machine Learning Engineer at Hugging Face

I’m Matthijs, and I’m a machine learning engineer for audio in the open source team at Hugging Face. I’m also the author of a book on how to write sound synthesizers, and I create audio plug-ins in my spare time.

Maria Khalusova, Documentation & Courses at Hugging Face

I’m Maria, and I create educational content and documentation to make Transformers and other open-source tools even more accessible. I break down complex technical concepts and help folks get started with cutting-edge technologies.

Vaibhav Srivastav, ML Developer Advocate Engineer at Hugging Face

I’m Vaibhav (VB) and I’m a Developer Advocate Engineer for Audio in the Open Source team at Hugging Face. I research low-resource Text to Speech and help bring SoTA speech research to the masses.

Course structure

The course is structured into several units that covers various topics in depth:

  • Unit 1: learn about the specifics of working with audio data, including audio processing techniques and data preparation.
  • Unit 2: get to know audio applications and learn how to use 🤗 Transformers pipelines for different tasks, such as audio classification and speech recognition.
  • Unit 3: explore audio transformer architectures, learn how they differ, and what tasks they are best suited for.
  • Unit 4: learn how to build your own music genre classifier.
  • Unit 5: delve into speech recognition and build a model to transcribe meeting recordings.
  • Unit 6: learn how to generate speech from text.
  • Unit 7: learn how to build real-world audio applications with transformers.

Each unit includes a theoretical component, where you will gain a deep understanding of the underlying concepts and techniques. Throughout the course, we provide quizzes to help you test your knowledge and reinforce your learning. Some chapters also include hands-on exercises, where you will have the opportunity to apply what you have learned.

By the end of the course, you will have a strong foundation in using transformers for audio data and will be well-equipped to apply these techniques to a wide range of audio-related tasks.

The course units will be released in several consecutive blocks with the following publishing schedule:

Units Publishing date
Unit 0, Unit 1, and Unit 2 June 14, 2023
Unit 3, Unit 4 June 21, 2023
Unit 5 June 28, 2023
Unit 6 July 5, 2023
Unit 7, Unit 8 July 12, 2023

Learning paths and certification

There is no right or wrong way to take this course. All the materials in this course are 100% free, public and open-source. You can take the course at your own pace, however, we recommend going through the units in their order.

If you’d like to get certified upon the course completion, we offer two options:

Certificate type Requirements
Certificate of completion Complete 80% of the hands-on exercises according to instructions.
Certificate of honors Complete 100% of the hands-on exercises according to instructions.

Each hands-on exercise outlines its completion criteria. Once you have completed enough hands-on exercises to qualify for either of the certificates, refer to the last unit of the course to learn how you can get your certificate. Good luck!

Sign up to the course

The units of this course will be released gradually over the course of a few weeks. We encourage you to sign up to the course updates so that you don’t miss new units when they are released. Learners who sign up to the course updates will also be the first ones to learn about special social events that we plan to host.


Enjoy the course!