Spaces:
Runtime error
Runtime error
Upload 9 files
#1
by
hbs2
- opened
- .gitignore +15 -0
- CONTRIBUTING.md +31 -0
- LICENSE +21 -0
- MANIFEST.in +3 -0
- README.md +276 -13
- requirements.conversion.txt +1 -0
- requirements.txt +5 -5
- setup.cfg +9 -0
- setup.py +68 -0
.gitignore
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Byte-compiled / Optimized / DLL Files
|
2 |
+
*.pyc
|
3 |
+
*.pyo
|
4 |
+
*.pyd
|
5 |
+
__pycache__/
|
6 |
+
|
7 |
+
# Distribution / Packaging
|
8 |
+
venv/
|
9 |
+
|
10 |
+
# Unit Test
|
11 |
+
.pytest_cache/
|
12 |
+
|
13 |
+
# Ignore IDE, Editor Files
|
14 |
+
.idea/
|
15 |
+
.vscode/
|
CONTRIBUTING.md
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Contributing to faster-whisper
|
2 |
+
|
3 |
+
Contributions are welcome! Here are some pointers to help you install the library for development and validate your changes before submitting a pull request.
|
4 |
+
|
5 |
+
## Install the library for development
|
6 |
+
|
7 |
+
We recommend installing the module in editable mode with the `dev` extra requirements:
|
8 |
+
|
9 |
+
```bash
|
10 |
+
git clone https://github.com/guillaumekln/faster-whisper.git
|
11 |
+
cd faster-whisper/
|
12 |
+
pip install -e .[dev]
|
13 |
+
```
|
14 |
+
|
15 |
+
## Validate the changes before creating a pull request
|
16 |
+
|
17 |
+
1. Make sure the existing tests are still passing (and consider adding new tests as well!):
|
18 |
+
|
19 |
+
```bash
|
20 |
+
pytest tests/
|
21 |
+
```
|
22 |
+
|
23 |
+
2. Reformat and validate the code with the following tools:
|
24 |
+
|
25 |
+
```bash
|
26 |
+
black .
|
27 |
+
isort .
|
28 |
+
flake8 .
|
29 |
+
```
|
30 |
+
|
31 |
+
These steps are also run automatically in the CI when you open the pull request.
|
LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
MIT License
|
2 |
+
|
3 |
+
Copyright (c) 2023 Guillaume Klein
|
4 |
+
|
5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6 |
+
of this software and associated documentation files (the "Software"), to deal
|
7 |
+
in the Software without restriction, including without limitation the rights
|
8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9 |
+
copies of the Software, and to permit persons to whom the Software is
|
10 |
+
furnished to do so, subject to the following conditions:
|
11 |
+
|
12 |
+
The above copyright notice and this permission notice shall be included in all
|
13 |
+
copies or substantial portions of the Software.
|
14 |
+
|
15 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21 |
+
SOFTWARE.
|
MANIFEST.in
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
include faster_whisper/assets/silero_vad.onnx
|
2 |
+
include requirements.txt
|
3 |
+
include requirements.conversion.txt
|
README.md
CHANGED
@@ -1,13 +1,276 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[![CI](https://github.com/guillaumekln/faster-whisper/workflows/CI/badge.svg)](https://github.com/guillaumekln/faster-whisper/actions?query=workflow%3ACI) [![PyPI version](https://badge.fury.io/py/faster-whisper.svg)](https://badge.fury.io/py/faster-whisper)
|
2 |
+
|
3 |
+
# Faster Whisper transcription with CTranslate2
|
4 |
+
|
5 |
+
**faster-whisper** is a reimplementation of OpenAI's Whisper model using [CTranslate2](https://github.com/OpenNMT/CTranslate2/), which is a fast inference engine for Transformer models.
|
6 |
+
|
7 |
+
This implementation is up to 4 times faster than [openai/whisper](https://github.com/openai/whisper) for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.
|
8 |
+
|
9 |
+
## Benchmark
|
10 |
+
|
11 |
+
### Whisper
|
12 |
+
|
13 |
+
For reference, here's the time and memory usage that are required to transcribe [**13 minutes**](https://www.youtube.com/watch?v=0u7tTptBo9I) of audio using different implementations:
|
14 |
+
|
15 |
+
* [openai/whisper](https://github.com/openai/whisper)@[6dea21fd](https://github.com/openai/whisper/commit/6dea21fd7f7253bfe450f1e2512a0fe47ee2d258)
|
16 |
+
* [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[3b010f9](https://github.com/ggerganov/whisper.cpp/commit/3b010f9bed9a6068609e9faf52383aea792b0362)
|
17 |
+
* [faster-whisper](https://github.com/guillaumekln/faster-whisper)@[cce6b53e](https://github.com/guillaumekln/faster-whisper/commit/cce6b53e4554f71172dad188c45f10fb100f6e3e)
|
18 |
+
|
19 |
+
### Large-v2 model on GPU
|
20 |
+
|
21 |
+
| Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |
|
22 |
+
| --- | --- | --- | --- | --- | --- |
|
23 |
+
| openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |
|
24 |
+
| faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |
|
25 |
+
| faster-whisper | int8 | 5 | 59s | 3091MB | 3117MB |
|
26 |
+
|
27 |
+
*Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S.*
|
28 |
+
|
29 |
+
### Small model on CPU
|
30 |
+
|
31 |
+
| Implementation | Precision | Beam size | Time | Max. memory |
|
32 |
+
| --- | --- | --- | --- | --- |
|
33 |
+
| openai/whisper | fp32 | 5 | 10m31s | 3101MB |
|
34 |
+
| whisper.cpp | fp32 | 5 | 17m42s | 1581MB |
|
35 |
+
| whisper.cpp | fp16 | 5 | 12m39s | 873MB |
|
36 |
+
| faster-whisper | fp32 | 5 | 2m44s | 1675MB |
|
37 |
+
| faster-whisper | int8 | 5 | 2m04s | 995MB |
|
38 |
+
|
39 |
+
*Executed with 8 threads on a Intel(R) Xeon(R) Gold 6226R.*
|
40 |
+
|
41 |
+
|
42 |
+
### Distil-whisper
|
43 |
+
|
44 |
+
| Implementation | Precision | Beam size | Time | Gigaspeech WER |
|
45 |
+
| --- | --- | --- | --- | --- |
|
46 |
+
| distil-whisper/distil-large-v2 | fp16 | 4 |- | 10.36 |
|
47 |
+
| [faster-distil-large-v2](https://huggingface.co/Systran/faster-distil-whisper-large-v2) | fp16 | 5 | - | 10.28 |
|
48 |
+
| distil-whisper/distil-medium.en | fp16 | 4 | - | 11.21 |
|
49 |
+
| [faster-distil-medium.en](https://huggingface.co/Systran/faster-distil-whisper-medium.en) | fp16 | 5 | - | 11.21 |
|
50 |
+
|
51 |
+
*Executed with CUDA 11.4 on a NVIDIA 3090.*
|
52 |
+
|
53 |
+
<details>
|
54 |
+
<summary>testing details (click to expand)</summary>
|
55 |
+
|
56 |
+
For `distil-whisper/distil-large-v2`, the WER is tested with code sample from [link](https://huggingface.co/distil-whisper/distil-large-v2#evaluation). for `faster-distil-whisper`, the WER is tested with setting:
|
57 |
+
```python
|
58 |
+
from faster_whisper import WhisperModel
|
59 |
+
|
60 |
+
model_size = "distil-large-v2"
|
61 |
+
# model_size = "distil-medium.en"
|
62 |
+
# Run on GPU with FP16
|
63 |
+
model = WhisperModel(model_size, device="cuda", compute_type="float16")
|
64 |
+
segments, info = model.transcribe("audio.mp3", beam_size=5, language="en")
|
65 |
+
```
|
66 |
+
</details>
|
67 |
+
|
68 |
+
## Requirements
|
69 |
+
|
70 |
+
* Python 3.8 or greater
|
71 |
+
|
72 |
+
Unlike openai-whisper, FFmpeg does **not** need to be installed on the system. The audio is decoded with the Python library [PyAV](https://github.com/PyAV-Org/PyAV) which bundles the FFmpeg libraries in its package.
|
73 |
+
|
74 |
+
### GPU
|
75 |
+
|
76 |
+
GPU execution requires the following NVIDIA libraries to be installed:
|
77 |
+
|
78 |
+
* [cuBLAS for CUDA 11](https://developer.nvidia.com/cublas)
|
79 |
+
* [cuDNN 8 for CUDA 11](https://developer.nvidia.com/cudnn)
|
80 |
+
|
81 |
+
There are multiple ways to install these libraries. The recommended way is described in the official NVIDIA documentation, but we also suggest other installation methods below.
|
82 |
+
|
83 |
+
<details>
|
84 |
+
<summary>Other installation methods (click to expand)</summary>
|
85 |
+
|
86 |
+
#### Use Docker
|
87 |
+
|
88 |
+
The libraries are installed in this official NVIDIA Docker image: `nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04`.
|
89 |
+
|
90 |
+
#### Install with `pip` (Linux only)
|
91 |
+
|
92 |
+
On Linux these libraries can be installed with `pip`. Note that `LD_LIBRARY_PATH` must be set before launching Python.
|
93 |
+
|
94 |
+
```bash
|
95 |
+
pip install nvidia-cublas-cu11 nvidia-cudnn-cu11
|
96 |
+
|
97 |
+
export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`
|
98 |
+
```
|
99 |
+
|
100 |
+
#### Download the libraries from Purfview's repository (Windows & Linux)
|
101 |
+
|
102 |
+
Purfview's [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) provides the required NVIDIA libraries for Windows & Linux in a [single archive](https://github.com/Purfview/whisper-standalone-win/releases/tag/libs). Decompress the archive and place the libraries in a directory included in the `PATH`.
|
103 |
+
|
104 |
+
</details>
|
105 |
+
|
106 |
+
## Installation
|
107 |
+
|
108 |
+
The module can be installed from [PyPI](https://pypi.org/project/faster-whisper/):
|
109 |
+
|
110 |
+
```bash
|
111 |
+
pip install faster-whisper
|
112 |
+
```
|
113 |
+
|
114 |
+
<details>
|
115 |
+
<summary>Other installation methods (click to expand)</summary>
|
116 |
+
|
117 |
+
### Install the master branch
|
118 |
+
|
119 |
+
```bash
|
120 |
+
pip install --force-reinstall "faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/refs/heads/master.tar.gz"
|
121 |
+
```
|
122 |
+
|
123 |
+
### Install a specific commit
|
124 |
+
|
125 |
+
```bash
|
126 |
+
pip install --force-reinstall "faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/a4f1cc8f11433e454c3934442b5e1a4ed5e865c3.tar.gz"
|
127 |
+
```
|
128 |
+
|
129 |
+
</details>
|
130 |
+
|
131 |
+
## Usage
|
132 |
+
|
133 |
+
### Faster-whisper
|
134 |
+
|
135 |
+
```python
|
136 |
+
from faster_whisper import WhisperModel
|
137 |
+
|
138 |
+
model_size = "large-v3"
|
139 |
+
|
140 |
+
# Run on GPU with FP16
|
141 |
+
model = WhisperModel(model_size, device="cuda", compute_type="float16")
|
142 |
+
|
143 |
+
# or run on GPU with INT8
|
144 |
+
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
|
145 |
+
# or run on CPU with INT8
|
146 |
+
# model = WhisperModel(model_size, device="cpu", compute_type="int8")
|
147 |
+
|
148 |
+
segments, info = model.transcribe("audio.mp3", beam_size=5)
|
149 |
+
|
150 |
+
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
|
151 |
+
|
152 |
+
for segment in segments:
|
153 |
+
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
|
154 |
+
```
|
155 |
+
|
156 |
+
**Warning:** `segments` is a *generator* so the transcription only starts when you iterate over it. The transcription can be run to completion by gathering the segments in a list or a `for` loop:
|
157 |
+
|
158 |
+
```python
|
159 |
+
segments, _ = model.transcribe("audio.mp3")
|
160 |
+
segments = list(segments) # The transcription will actually run here.
|
161 |
+
```
|
162 |
+
### Faster-distil-whisper
|
163 |
+
For usage of `faster-ditil-whisper`, please refer to: https://github.com/guillaumekln/faster-whisper/issues/533
|
164 |
+
|
165 |
+
```python
|
166 |
+
model_size = "distil-large-v2"
|
167 |
+
# model_size = "distil-medium.en"
|
168 |
+
model = WhisperModel(model_size, device="cuda", compute_type="float16")
|
169 |
+
segments, info = model.transcribe("audio.mp3", beam_size=5,
|
170 |
+
language="en", max_new_tokens=128, condition_on_previous_text=False)
|
171 |
+
|
172 |
+
```
|
173 |
+
NOTE: Empirically, `condition_on_previous_text=True` will degrade the performance of `faster-distil-whisper` for long audio. Degradation on the first chunk was observed with `initial_prompt` too.
|
174 |
+
|
175 |
+
### Word-level timestamps
|
176 |
+
|
177 |
+
```python
|
178 |
+
segments, _ = model.transcribe("audio.mp3", word_timestamps=True)
|
179 |
+
|
180 |
+
for segment in segments:
|
181 |
+
for word in segment.words:
|
182 |
+
print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))
|
183 |
+
```
|
184 |
+
|
185 |
+
### VAD filter
|
186 |
+
|
187 |
+
The library integrates the [Silero VAD](https://github.com/snakers4/silero-vad) model to filter out parts of the audio without speech:
|
188 |
+
|
189 |
+
```python
|
190 |
+
segments, _ = model.transcribe("audio.mp3", vad_filter=True)
|
191 |
+
```
|
192 |
+
|
193 |
+
The default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in the [source code](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py). They can be customized with the dictionary argument `vad_parameters`:
|
194 |
+
|
195 |
+
```python
|
196 |
+
segments, _ = model.transcribe(
|
197 |
+
"audio.mp3",
|
198 |
+
vad_filter=True,
|
199 |
+
vad_parameters=dict(min_silence_duration_ms=500),
|
200 |
+
)
|
201 |
+
```
|
202 |
+
|
203 |
+
### Logging
|
204 |
+
|
205 |
+
The library logging level can be configured like this:
|
206 |
+
|
207 |
+
```python
|
208 |
+
import logging
|
209 |
+
|
210 |
+
logging.basicConfig()
|
211 |
+
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)
|
212 |
+
```
|
213 |
+
|
214 |
+
### Going further
|
215 |
+
|
216 |
+
See more model and transcription options in the [`WhisperModel`](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/transcribe.py) class implementation.
|
217 |
+
|
218 |
+
## Community integrations
|
219 |
+
|
220 |
+
Here is a non exhaustive list of open-source projects using faster-whisper. Feel free to add your project to the list!
|
221 |
+
|
222 |
+
|
223 |
+
* [WhisperX](https://github.com/m-bain/whisperX) is an award-winning Python library that offers speaker diarization and accurate word-level timestamps using wav2vec2 alignment
|
224 |
+
* [whisper-ctranslate2](https://github.com/Softcatala/whisper-ctranslate2) is a command line client based on faster-whisper and compatible with the original client from openai/whisper.
|
225 |
+
* [whisper-diarize](https://github.com/MahmoudAshraf97/whisper-diarization) is a speaker diarization tool that is based on faster-whisper and NVIDIA NeMo.
|
226 |
+
* [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) Standalone CLI executables of faster-whisper for Windows, Linux & macOS.
|
227 |
+
* [asr-sd-pipeline](https://github.com/hedrergudene/asr-sd-pipeline) provides a scalable, modular, end to end multi-speaker speech to text solution implemented using AzureML pipelines.
|
228 |
+
* [Open-Lyrics](https://github.com/zh-plus/Open-Lyrics) is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into `.lrc` files in the desired language using OpenAI-GPT.
|
229 |
+
* [wscribe](https://github.com/geekodour/wscribe) is a flexible transcript generation tool supporting faster-whisper, it can export word level transcript and the exported transcript then can be edited with [wscribe-editor](https://github.com/geekodour/wscribe-editor)
|
230 |
+
* [aTrain](https://github.com/BANDAS-Center/aTrain) is a graphical user interface implementation of faster-whisper developed at the BANDAS-Center at the University of Graz for transcription and diarization in Windows ([Windows Store App](https://apps.microsoft.com/detail/atrain/9N15Q44SZNS2)) and Linux.
|
231 |
+
* [Whisper-Streaming](https://github.com/ufal/whisper_streaming) implements real-time mode for offline Whisper-like speech-to-text models with faster-whisper as the most recommended back-end. It implements a streaming policy with self-adaptive latency based on the actual source complexity, and demonstrates the state of the art.
|
232 |
+
* [WhisperLive](https://github.com/collabora/WhisperLive) is a nearly-live implementation of OpenAI's Whisper which uses faster-whisper as the backend to transcribe audio in real-time.
|
233 |
+
* [Faster-Whisper-Transcriber](https://github.com/BBC-Esq/ctranslate2-faster-whisper-transcriber) is a simple but reliable voice transcriber that provides a user-friendly interface.
|
234 |
+
|
235 |
+
## Model conversion
|
236 |
+
|
237 |
+
When loading a model from its size such as `WhisperModel("large-v3")`, the corresponding CTranslate2 model is automatically downloaded from the [Hugging Face Hub](https://huggingface.co/Systran).
|
238 |
+
|
239 |
+
We also provide a script to convert any Whisper models compatible with the Transformers library. They could be the original OpenAI models or user fine-tuned models.
|
240 |
+
|
241 |
+
For example the command below converts the [original "large-v3" Whisper model](https://huggingface.co/openai/whisper-large-v3) and saves the weights in FP16:
|
242 |
+
|
243 |
+
```bash
|
244 |
+
pip install transformers[torch]>=4.23
|
245 |
+
|
246 |
+
ct2-transformers-converter --model openai/whisper-large-v3 --output_dir whisper-large-v3-ct2
|
247 |
+
--copy_files tokenizer.json preprocessor_config.json --quantization float16
|
248 |
+
```
|
249 |
+
|
250 |
+
* The option `--model` accepts a model name on the Hub or a path to a model directory.
|
251 |
+
* If the option `--copy_files tokenizer.json` is not used, the tokenizer configuration is automatically downloaded when the model is loaded later.
|
252 |
+
|
253 |
+
Models can also be converted from the code. See the [conversion API](https://opennmt.net/CTranslate2/python/ctranslate2.converters.TransformersConverter.html).
|
254 |
+
|
255 |
+
### Load a converted model
|
256 |
+
|
257 |
+
1. Directly load the model from a local directory:
|
258 |
+
```python
|
259 |
+
model = faster_whisper.WhisperModel("whisper-large-v3-ct2")
|
260 |
+
```
|
261 |
+
|
262 |
+
2. [Upload your model to the Hugging Face Hub](https://huggingface.co/docs/transformers/model_sharing#upload-with-the-web-interface) and load it from its name:
|
263 |
+
```python
|
264 |
+
model = faster_whisper.WhisperModel("username/whisper-large-v3-ct2")
|
265 |
+
```
|
266 |
+
|
267 |
+
## Comparing performance against other implementations
|
268 |
+
|
269 |
+
If you are comparing the performance against other Whisper implementations, you should make sure to run the comparison with similar settings. In particular:
|
270 |
+
|
271 |
+
* Verify that the same transcription options are used, especially the same beam size. For example in openai/whisper, `model.transcribe` uses a default beam size of 1 but here we use a default beam size of 5.
|
272 |
+
* When running on CPU, make sure to set the same number of threads. Many frameworks will read the environment variable `OMP_NUM_THREADS`, which can be set when running your script:
|
273 |
+
|
274 |
+
```bash
|
275 |
+
OMP_NUM_THREADS=4 python3 my_script.py
|
276 |
+
```
|
requirements.conversion.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
transformers[torch]>=4.23
|
requirements.txt
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
|
|
1 |
+
av==11.*
|
2 |
+
ctranslate2>=4.0,<5
|
3 |
+
huggingface_hub>=0.13
|
4 |
+
tokenizers>=0.13,<0.16
|
5 |
+
onnxruntime>=1.14,<2
|
setup.cfg
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[flake8]
|
2 |
+
max-line-length = 100
|
3 |
+
ignore =
|
4 |
+
E203,
|
5 |
+
W503,
|
6 |
+
|
7 |
+
[isort]
|
8 |
+
profile=black
|
9 |
+
lines_between_types=1
|
setup.py
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
|
3 |
+
from setuptools import find_packages, setup
|
4 |
+
|
5 |
+
base_dir = os.path.dirname(os.path.abspath(__file__))
|
6 |
+
|
7 |
+
|
8 |
+
def get_long_description():
|
9 |
+
readme_path = os.path.join(base_dir, "README.md")
|
10 |
+
with open(readme_path, encoding="utf-8") as readme_file:
|
11 |
+
return readme_file.read()
|
12 |
+
|
13 |
+
|
14 |
+
def get_project_version():
|
15 |
+
version_path = os.path.join(base_dir, "faster_whisper", "version.py")
|
16 |
+
version = {}
|
17 |
+
with open(version_path, encoding="utf-8") as fp:
|
18 |
+
exec(fp.read(), version)
|
19 |
+
return version["__version__"]
|
20 |
+
|
21 |
+
|
22 |
+
def get_requirements(path):
|
23 |
+
with open(path, encoding="utf-8") as requirements:
|
24 |
+
return [requirement.strip() for requirement in requirements]
|
25 |
+
|
26 |
+
|
27 |
+
install_requires = get_requirements(os.path.join(base_dir, "requirements.txt"))
|
28 |
+
conversion_requires = get_requirements(
|
29 |
+
os.path.join(base_dir, "requirements.conversion.txt")
|
30 |
+
)
|
31 |
+
|
32 |
+
setup(
|
33 |
+
name="faster-whisper",
|
34 |
+
version=get_project_version(),
|
35 |
+
license="MIT",
|
36 |
+
description="Faster Whisper transcription with CTranslate2",
|
37 |
+
long_description=get_long_description(),
|
38 |
+
long_description_content_type="text/markdown",
|
39 |
+
author="Guillaume Klein",
|
40 |
+
url="https://github.com/guillaumekln/faster-whisper",
|
41 |
+
classifiers=[
|
42 |
+
"Development Status :: 4 - Beta",
|
43 |
+
"Intended Audience :: Developers",
|
44 |
+
"Intended Audience :: Science/Research",
|
45 |
+
"License :: OSI Approved :: MIT License",
|
46 |
+
"Programming Language :: Python :: 3",
|
47 |
+
"Programming Language :: Python :: 3 :: Only",
|
48 |
+
"Programming Language :: Python :: 3.8",
|
49 |
+
"Programming Language :: Python :: 3.9",
|
50 |
+
"Programming Language :: Python :: 3.10",
|
51 |
+
"Programming Language :: Python :: 3.11",
|
52 |
+
"Topic :: Scientific/Engineering :: Artificial Intelligence",
|
53 |
+
],
|
54 |
+
keywords="openai whisper speech ctranslate2 inference quantization transformer",
|
55 |
+
python_requires=">=3.8",
|
56 |
+
install_requires=install_requires,
|
57 |
+
extras_require={
|
58 |
+
"conversion": conversion_requires,
|
59 |
+
"dev": [
|
60 |
+
"black==23.*",
|
61 |
+
"flake8==6.*",
|
62 |
+
"isort==5.*",
|
63 |
+
"pytest==7.*",
|
64 |
+
],
|
65 |
+
},
|
66 |
+
packages=find_packages(),
|
67 |
+
include_package_data=True,
|
68 |
+
)
|