sanchit-gandhi HF staff commited on
Commit
d08db9b
1 Parent(s): ca97396

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md CHANGED
@@ -1,3 +1,78 @@
1
  ---
 
 
 
 
 
2
  license: mit
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - audio
6
+ - automatic-speech-recognition
7
  license: mit
8
+ library_name: ctranslate2
9
  ---
10
+
11
+ # Distil-Whisper: distil-large-v3 for CTranslate2
12
+
13
+ This repository contains the model weights for [distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3)
14
+ converted to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format. CTranslate2 is a fast inference engine for
15
+ Transformer models and is the supported backend for the [Faster-Whisper](https://github.com/systran/faster-whisper) package.
16
+
17
+ ## Usage
18
+
19
+ To use the model in Faster-Whisper, first install the PyPi package according to the [official instructions](https://github.com/SYSTRAN/faster-whisper#installation).
20
+ For this example, we'll also install 🤗 Datasets to load a toy audio dataset from the Hugging Face Hub:
21
+
22
+ ```bash
23
+ pip install --upgrade pip
24
+ pip install --upgrade faster-whisper datasets[audio]
25
+ ```
26
+
27
+ The following code snippet loads the distil-large-v3 model and runs inference on an example file from the LibriSpeech ASR
28
+ dataset:
29
+
30
+ ```python
31
+ import torch
32
+ from faster_whisper import WhisperModel
33
+ from datasets import load_dataset
34
+
35
+ # define our torch configuration
36
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
37
+ compute_type = "float16" if torch.cuda.is_available() else "float32"
38
+
39
+ # load model on GPU if available, else cpu
40
+ model = WhisperModel("distil-large-v3", device=device, compute_type=compute_type)
41
+
42
+ # load toy dataset for example
43
+ dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
44
+ sample = dataset[1]["audio"]["path"]
45
+
46
+ segments, info = model.transcribe(sample, beam_size=1)
47
+
48
+ for segment in segments:
49
+ print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
50
+ ```
51
+
52
+ To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
53
+
54
+ ```python
55
+ segments, info = model.transcribe("audio.mp3", beam_size=1)
56
+ ```
57
+
58
+ ## Model Details
59
+
60
+ For more information about the distil-large-v3 model, refer to the original [model card](https://huggingface.co/distil-whisper/distil-large-v3).
61
+
62
+ ## License
63
+
64
+ Distil-Whisper inherits the [MIT license](https://github.com/huggingface/distil-whisper/blob/main/LICENSE) from OpenAI's Whisper model.
65
+
66
+ ## Citation
67
+
68
+ If you use this model, please consider citing the [Distil-Whisper paper](https://arxiv.org/abs/2311.00430):
69
+ ```
70
+ @misc{gandhi2023distilwhisper,
71
+ title={Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling},
72
+ author={Sanchit Gandhi and Patrick von Platen and Alexander M. Rush},
73
+ year={2023},
74
+ eprint={2311.00430},
75
+ archivePrefix={arXiv},
76
+ primaryClass={cs.CL}
77
+ }
78
+ ```