sanchit-gandhi HF staff commited on
Commit
75040bd
1 Parent(s): 1ca7336

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -19
README.md CHANGED
@@ -41,7 +41,7 @@ to distill Whisper on other languages. If you are interested in distilling Whisp
41
  provided [training code](https://github.com/huggingface/distil-whisper/tree/main/training). We will update the
42
  [Distil-Whisper repository](https://github.com/huggingface/distil-whisper/) with multilingual checkpoints when ready!
43
 
44
- ### Why is `distil-small.en` slower than `distil-large-v2`?
45
 
46
  While [distil-medium.en](https://huggingface.co/distil-whisper/distil-medium.en) and [distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2)
47
  use two decoder layers each, distil-small.en uses four. Using more decoder layers improves the WER performance of the
@@ -170,7 +170,7 @@ In the following code-snippet, we load the assistant Distil-Whisper model standa
170
  specify it as the "assistant model" for generation:
171
 
172
  ```python
173
- from transformers import pipeline, AutoModelForCausalLM, AutoModelForSpeechSeq2Seq, AutoProcessor
174
  import torch
175
  from datasets import load_dataset
176
 
@@ -249,10 +249,6 @@ model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch_dt
249
 
250
  ### Running Distil-Whisper in `openai-whisper`
251
 
252
- Coming soon!
253
-
254
- <!---
255
-
256
  To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
257
 
258
  ```bash
@@ -268,8 +264,8 @@ from datasets import load_dataset
268
  from huggingface_hub import hf_hub_download
269
  from whisper import load_model, transcribe
270
 
271
- medium_en = hf_hub_download(repo_id="distil-whisper/distil-small.en", filename="original-model.bin")
272
- model = load_model(medium_en)
273
 
274
  dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
275
  sample = dataset[0]["audio"]["array"]
@@ -279,22 +275,21 @@ pred_out = transcribe(model, audio=sample)
279
  print(pred_out["text"])
280
  ```
281
 
 
 
 
 
282
  To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
283
 
284
  ```python
285
  pred_out = transcribe(model, audio="audio.mp3")
286
  ```
287
- --->
288
 
289
  ### Whisper.cpp
290
 
291
- Coming soon!
292
-
293
- <!---
294
-
295
  Distil-Whisper can be run from the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) repository with the original
296
  sequential long-form transcription algorithm. In a [provisional benchmark](https://github.com/ggerganov/whisper.cpp/pull/1424#issuecomment-1793513399)
297
- on Mac M1, `distil-medium.en` is 4x faster than `large-v2`, while performing to within 1% WER over long-form audio.
298
 
299
  Steps for getting started:
300
  1. Clone the Whisper.cpp repository:
@@ -305,23 +300,21 @@ cd whisper.cpp
305
  2. Download the ggml weights for `distil-small.en` from the Hugging Face Hub:
306
 
307
  ```bash
308
- python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='distil-whisper/distil-small.en', filename='ggml-medium-32-2.en.bin', local_dir='./models')"
309
  ```
310
 
311
  Note that if you do not have the `huggingface_hub` package installed, you can also download the weights with `wget`:
312
 
313
  ```bash
314
- wget https://huggingface.co/distil-whisper/distil-small.en/resolve/main/ggml-medium-32-2.en.bin -P ./models
315
  ```
316
 
317
  3. Run inference using the provided sample audio:
318
 
319
  ```bash
320
- make -j && ./main -m models/ggml-medium-32-2.en.bin -f samples/jfk.wav
321
  ```
322
 
323
- --->
324
-
325
  ### Transformers.js
326
 
327
  ```js
 
41
  provided [training code](https://github.com/huggingface/distil-whisper/tree/main/training). We will update the
42
  [Distil-Whisper repository](https://github.com/huggingface/distil-whisper/) with multilingual checkpoints when ready!
43
 
44
+ ### Why is distil-small.en slower than distil-large-v2?
45
 
46
  While [distil-medium.en](https://huggingface.co/distil-whisper/distil-medium.en) and [distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2)
47
  use two decoder layers each, distil-small.en uses four. Using more decoder layers improves the WER performance of the
 
170
  specify it as the "assistant model" for generation:
171
 
172
  ```python
173
+ from transformers import pipeline, AutoModelForSpeechSeq2Seq, AutoProcessor
174
  import torch
175
  from datasets import load_dataset
176
 
 
249
 
250
  ### Running Distil-Whisper in `openai-whisper`
251
 
 
 
 
 
252
  To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
253
 
254
  ```bash
 
264
  from huggingface_hub import hf_hub_download
265
  from whisper import load_model, transcribe
266
 
267
+ distil_small_en = hf_hub_download(repo_id="distil-whisper/distil-small.en", filename="original-model.bin")
268
+ model = load_model(distil_small_en)
269
 
270
  dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
271
  sample = dataset[0]["audio"]["array"]
 
275
  print(pred_out["text"])
276
  ```
277
 
278
+ Note that the model weights will be downloaded and saved to your cache the first time you run the example. Subsequently,
279
+ you can re-use the same example, and the weights will be loaded directly from your cache without having to download them
280
+ again.
281
+
282
  To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
283
 
284
  ```python
285
  pred_out = transcribe(model, audio="audio.mp3")
286
  ```
 
287
 
288
  ### Whisper.cpp
289
 
 
 
 
 
290
  Distil-Whisper can be run from the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) repository with the original
291
  sequential long-form transcription algorithm. In a [provisional benchmark](https://github.com/ggerganov/whisper.cpp/pull/1424#issuecomment-1793513399)
292
+ on Mac M1, `distil-small.en` is over 4x faster than `large-v2`, while performing to within 1.4% WER over long-form audio.
293
 
294
  Steps for getting started:
295
  1. Clone the Whisper.cpp repository:
 
300
  2. Download the ggml weights for `distil-small.en` from the Hugging Face Hub:
301
 
302
  ```bash
303
+ python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='distil-whisper/distil-small.en', filename='ggml-distil-small.en.bin', local_dir='./models')"
304
  ```
305
 
306
  Note that if you do not have the `huggingface_hub` package installed, you can also download the weights with `wget`:
307
 
308
  ```bash
309
+ wget https://huggingface.co/distil-whisper/distil-small.en/resolve/main/ggml-distil-small.en.bin -P ./models
310
  ```
311
 
312
  3. Run inference using the provided sample audio:
313
 
314
  ```bash
315
+ make -j && ./main -m models/ggml-distil-small.en.bin -f samples/jfk.wav
316
  ```
317
 
 
 
318
  ### Transformers.js
319
 
320
  ```js