distil-whisper
/

distil-large-v2

Automatic Speech Recognition

Transformers.js

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

sanchit-gandhi HF staff commited on Nov 6, 2023

Commit

4506258

•

1 Parent(s): d7299c4

add section on OAI whisper

Files changed (1) hide show

README.md +34 -4

README.md CHANGED Viewed

@@ -46,7 +46,7 @@ pip install --upgrade transformers accelerate datasets[audio]
 ### Short-Form Transcription
 The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
-class to transcribe short-form audio files as follows:
 ```python
 import torch
@@ -91,7 +91,7 @@ To transcribe a local audio file, simply pass the path to your audio file when y
 ### Long-Form Transcription
-Distil-Whisper uses a chunked algorithm to transcribe long-form audio files. In practice, this chunked long-form algorithm
 is 9x faster than the sequential algorithm proposed by OpenAI in the Whisper paper (see Table 7 of the [Distil-Whisper paper](https://arxiv.org/abs/2311.00430)).
 To enable chunking, pass the `chunk_length_s` parameter to the `pipeline`. For Distil-Whisper, a chunk length of 15-seconds
@@ -241,9 +241,39 @@ Coming soon ...
 Coming soon ...
-### Running Whisper in `openai/whisper`
-Coming soon ...
 ### Transformers.js

 ### Short-Form Transcription
 The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
+class to transcribe short-form audio files (< 30-seconds) as follows:
 ```python
 import torch
 ### Long-Form Transcription
+Distil-Whisper uses a chunked algorithm to transcribe long-form audio files (> 30-seconds). In practice, this chunked long-form algorithm
 is 9x faster than the sequential algorithm proposed by OpenAI in the Whisper paper (see Table 7 of the [Distil-Whisper paper](https://arxiv.org/abs/2311.00430)).
 To enable chunking, pass the `chunk_length_s` parameter to the `pipeline`. For Distil-Whisper, a chunk length of 15-seconds
 Coming soon ...
+### Running Whisper in `openai-whisper`
+To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
+```bash
+pip install --upgrade openai-whisper
+```
+The following code-snippet demonstrates how to transcribe a sample file from the LibriSpeech dataset loaded using
+🤗 Datasets:
+```python
+import torch
+from datasets import load_dataset
+from huggingface_hub import hf_hub_download
+from whisper import load_model, transcribe
+medium_en = hf_hub_download(repo_id="distil-whisper/distil-medium.en", filename="original-model.bin")
+model = load_model(medium_en)
+dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
+sample = dataset[0]["audio"]["array"]
+sample = torch.from_numpy(sample).float()
+pred_out = transcribe(model, audio=sample)
+print(pred_out["text"])
+```
+To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
+```python
+pred_out = transcribe(model, audio="audio.mp3")
+```
 ### Transformers.js