Whisper Distillation

community

https://github.com/huggingface/distil-whisper

Activity Feed Request to join this org

AI & ML interests

Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.

Recent Activity

Xenova authored a paper 4 days ago

SmolVLM: Redefining small and efficient multimodal models

reach-vb authored a paper 4 days ago

SmolVLM: Redefining small and efficient multimodal models

reach-vb new activity 18 days ago

distil-whisper/distil-large-v3.5:Move Transformers.js-compatible version to separate repo

View all activity

Organization Card

Community About org cards

Distil-Whisper

[Paper] [Models] [Colab] [Training Code]

Distil-Whisper is a distilled version of Whisper that is 6 times faster, 49% smaller, and performs within 1% word error rate (WER) on out-of-distribution evaluation sets:

Model	Params / M	Rel. Latency ↑	Short-Form WER ↓	Long-Form WER ↓
large-v3	1550	1.0	8.4	11.0

distil-large-v3	756	6.3	9.7	10.8
distil-large-v2	756	5.8	10.1	11.6
distil-medium.en	394	6.8	11.1	12.4
distil-small.en	166	5.6	12.1	12.8

For most applications, we recommend the latest distil-large-v3 checkpoint, since it is the most performant distilled checkpoint and compatible across all Whisper libraries. The only exception is resource-constrained applications with very little memory, such as on-device or mobile applications, where the distil-small.en is a great choice, since it is only 166M parameters and performs within 4% WER of Whisper large-v3.

Note: Distil-Whisper is currently only available for English speech recognition. We are working with the community to distill Whisper on other languages. If you are interested in distilling Whisper in your language, check out the provided training code. We will soon update the repository with multilingual checkpoints when ready!

Collections 6

spaces 2

Whisper vs Distil-Whisper

Whisper Analysis

Analyze Whisper and Distil-Whisper transcriptions

models 12

distil-whisper/distil-large-v3.5

Automatic Speech Recognition • Updated 18 days ago • 3.81k • • 18

distil-whisper/distil-large-v3.5-ONNX

Automatic Speech Recognition • Updated 18 days ago • 7 • 1

distil-whisper/distil-large-v3.5-ggml

Automatic Speech Recognition • Updated 23 days ago • 2

distil-whisper/distil-large-v3.5-ct2

Automatic Speech Recognition • Updated 23 days ago • 191 • 2

distil-whisper/distil-large-v3.5-openai

Automatic Speech Recognition • Updated 23 days ago

distil-whisper/distil-large-v3

Automatic Speech Recognition • Updated Mar 6 • 1.18M • • 308

distil-whisper/distil-large-v2

Automatic Speech Recognition • Updated Mar 6 • 36k • • 508

distil-whisper/distil-large-v3-openai

Automatic Speech Recognition • Updated Mar 27, 2024 • 4

distil-whisper/distil-small.en

Automatic Speech Recognition • Updated Mar 25, 2024 • 40.7k • • 97

distil-whisper/distil-medium.en

Automatic Speech Recognition • Updated Mar 25, 2024 • 197k • • 120

datasets 37

distil-whisper/librispeech_asr_dummy-concatenated

Viewer • Updated Dec 15, 2023 • 17 • 32

distil-whisper/librispeech_asr_dummy

Viewer • Updated Nov 10, 2023 • 146 • 373

distil-whisper/librispeech_long

Viewer • Updated Nov 2, 2023 • 1 • 14.8k • 2

distil-whisper/figures

Viewer • Updated Oct 31, 2023 • 6 • 17.4k • 2

distil-whisper/meanwhile

Viewer • Updated Oct 17, 2023 • 64 • 3.33k

distil-whisper/rev16

Viewer • Updated Oct 17, 2023 • 46 • 150

distil-whisper/earnings22

Viewer • Updated Oct 13, 2023 • 57.5k • 2.87k • 6

distil-whisper/earnings21

Viewer • Updated Oct 13, 2023 • 44 • 84 • 2

distil-whisper/whisper_transcriptions_token_ids

Viewer • Updated Oct 11, 2023 • 340k • 20

distil-whisper/gigaspeech-l-token-ids

Updated Oct 11, 2023 • 28