sanchit-gandhi HF staff commited on
Commit
9e3e0be
1 Parent(s): c7bea3d

whisper cpp

Browse files
Files changed (1) hide show
  1. README.md +30 -4
README.md CHANGED
@@ -263,6 +263,36 @@ To transcribe a local audio file, simply pass the path to the audio file as the
263
  pred_out = transcribe(model, audio="audio.mp3")
264
  ```
265
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
266
  ### Transformers.js
267
 
268
  ```js
@@ -312,10 +342,6 @@ cargo run --example whisper --release -- --model distil-medium.en --input audio.
312
 
313
  Coming soon ...
314
 
315
- ### Whisper.cpp
316
-
317
- Coming soon ...
318
-
319
  ## Model Details
320
 
321
  Distil-Whisper inherits the encoder-decoder architecture from Whisper. The encoder maps a sequence of speech vector
 
263
  pred_out = transcribe(model, audio="audio.mp3")
264
  ```
265
 
266
+ ### Whisper.cpp
267
+
268
+ Distil-Whisper can be run from the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) repository with the original
269
+ sequential long-form transcription algorithm. In a [provisional benchmark](https://github.com/ggerganov/whisper.cpp/pull/1424#issuecomment-1793513399)
270
+ on Mac M1, `distil-medium.en` is 4x faster than `large-v2`, while performing to within 1% WER over long-form audio.
271
+
272
+ Steps for getting started:
273
+ 1. Clone the Whisper.cpp repository:
274
+ ```
275
+ git clone https://github.com/ggerganov/whisper.cpp.git
276
+ cd whisper.cpp
277
+ ```
278
+ 2. Download the ggml weights for `distil-medium.en` from the Hugging Face Hub:
279
+
280
+ ```bash
281
+ python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='distil-whisper/distil-medium.en', filename='ggml-medium-32-2.en.bin', local_dir='./models')"
282
+ ```
283
+
284
+ Note that if you do not have the `huggingface_hub` package installed, you can also download the weights with `wget`:
285
+
286
+ ```bash
287
+ wget https://huggingface.co/distil-whisper/distil-medium.en/resolve/main/ggml-medium-32-2.en.bin -P ./models
288
+ ```
289
+
290
+ 3. Run inference using the provided sample audio:
291
+
292
+ ```bash
293
+ make -j && ./main -m models/ggml-medium-32-2.en.bin -f samples/jfk.wav
294
+ ```
295
+
296
  ### Transformers.js
297
 
298
  ```js
 
342
 
343
  Coming soon ...
344
 
 
 
 
 
345
  ## Model Details
346
 
347
  Distil-Whisper inherits the encoder-decoder architecture from Whisper. The encoder maps a sequence of speech vector