Spaces:
Configuration error
Configuration error
Fedir Zadniprovskyi
commited on
Commit
·
f043430
1
Parent(s):
ce5dbe5
docs: update README.md
Browse files- README.md +64 -36
- audio.wav +0 -0
- faster_whisper_server/main.py +0 -2
README.md
CHANGED
@@ -1,13 +1,23 @@
|
|
1 |
-
|
2 |
-
`faster-whisper-server` is
|
3 |
-
|
4 |
-
-
|
5 |
-
-
|
6 |
-
-
|
7 |
-
-
|
8 |
-
|
9 |
Please create an issue if you find a bug, have a question, or a feature suggestion.
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
Using Docker
|
12 |
```bash
|
13 |
docker run --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:cuda
|
@@ -17,42 +27,60 @@ docker run --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggin
|
|
17 |
Using Docker Compose
|
18 |
```bash
|
19 |
curl -sO https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
|
20 |
-
docker compose up --detach
|
21 |
# or
|
22 |
-
docker compose up --detach
|
23 |
```
|
24 |
## Usage
|
25 |
-
|
26 |
```bash
|
27 |
-
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
```
|
31 |
Streaming audio data from a file.
|
32 |
```bash
|
33 |
-
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - >
|
34 |
# send all data at once
|
35 |
-
cat
|
36 |
# Output: {"text":"One,"}{"text":"One, two, three, four, five."}{"text":"One, two, three, four, five."}%
|
37 |
# streaming 16000 samples per second. each sample is 2 bytes
|
38 |
-
cat
|
39 |
# Output: {"text":"One,"}{"text":"One, two,"}{"text":"One, two, three,"}{"text":"One, two, three, four, five."}{"text":"One, two, three, four, five. one."}%
|
40 |
```
|
41 |
-
Transcribing a file
|
42 |
-
```bash
|
43 |
-
# convert the file if it has a different format
|
44 |
-
ffmpeg -i output.wav -ac 1 -ar 16000 -f s16le output.raw
|
45 |
-
curl -X POST -F "file=@output.raw" http://0.0.0.0:8000/v1/audio/transcriptions
|
46 |
-
# Output: "{\"text\":\"One, two, three, four, five.\"}"%
|
47 |
-
```
|
48 |
-
## Roadmap
|
49 |
-
- [ ] Support file transcription (non-streaming) of multiple formats.
|
50 |
-
- [ ] CLI client.
|
51 |
-
- [ ] Separate the web server related code from the "core", and publish "core" as a package.
|
52 |
-
- [ ] Additional documentation and code comments.
|
53 |
-
- [ ] Write benchmarks for measuring streaming transcription performance. Possible metrics:
|
54 |
-
- Latency (time when transcription is sent - time between when audio has been received)
|
55 |
-
- Accuracy (already being measured when testing but the process can be improved)
|
56 |
-
- Total seconds of audio transcribed / audio duration (since each audio chunk is being processed at least twice)
|
57 |
-
- [ ] Get the API response closer to the format used by OpenAI.
|
58 |
-
- [ ] Integrations...
|
|
|
1 |
+
# Faster Whisper Server
|
2 |
+
`faster-whisper-server` is an OpenAI API compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as it's backend.
|
3 |
+
Features:
|
4 |
+
- GPU and CPU support.
|
5 |
+
- Easily deployable using Docker.
|
6 |
+
- Configurable through environment variables (see [config.py](./faster_whisper_server/config.py)).
|
7 |
+
- OpenAI API compatible.
|
8 |
+
|
9 |
Please create an issue if you find a bug, have a question, or a feature suggestion.
|
10 |
+
|
11 |
+
## OpenAI API Compatibility ++
|
12 |
+
See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.
|
13 |
+
- Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
|
14 |
+
- Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions(and translations). This is usefull for when you want to process large audio files would rather receive the transcription in chunks as they are processed rather than waiting for the whole file to be transcribe. It works in the similar way to chat messages are being when chatting with LLMs.
|
15 |
+
- Audio file translation via `POST /v1/audio/translations` endpoint.
|
16 |
+
- (WIP) Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
|
17 |
+
- LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.
|
18 |
+
- Only transcription of single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported.
|
19 |
+
|
20 |
+
## Quick Start
|
21 |
Using Docker
|
22 |
```bash
|
23 |
docker run --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:cuda
|
|
|
27 |
Using Docker Compose
|
28 |
```bash
|
29 |
curl -sO https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
|
30 |
+
docker compose up --detach faster-whisper-server-cuda
|
31 |
# or
|
32 |
+
docker compose up --detach faster-whisper-server-cpu
|
33 |
```
|
34 |
## Usage
|
35 |
+
### OpenAI API CLI
|
36 |
```bash
|
37 |
+
export OPENAI_API_KEY="cant-be-empty"
|
38 |
+
export OPENAI_BASE_URL=http://localhost:8000/v1/
|
39 |
+
```
|
40 |
+
```bash
|
41 |
+
openai api audio.transcriptions.create -m distil-medium.en -f audio.wav --response-format text
|
42 |
+
|
43 |
+
openai api audio.translations.create -m distil-medium.en -f audio.wav --response-format verbose_json
|
44 |
+
```
|
45 |
+
### OpenAI API Python SDK
|
46 |
+
```python
|
47 |
+
from openai import OpenAI
|
48 |
+
|
49 |
+
client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/")
|
50 |
+
|
51 |
+
audio_file = open("audio.wav", "rb")
|
52 |
+
transcript = client.audio.transcriptions.create(
|
53 |
+
model="distil-medium.en", file=audio_file
|
54 |
+
)
|
55 |
+
print(transcript.text)
|
56 |
+
```
|
57 |
+
|
58 |
+
### CURL
|
59 |
+
```bash
|
60 |
+
# If `model` isn't specified, the default model is used
|
61 |
+
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav"
|
62 |
+
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.mp3"
|
63 |
+
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" -F "streaming=true"
|
64 |
+
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" -F "streaming=true" -F "model=distil-large-v3"
|
65 |
+
# It's recommended that you always specify the language as that will reduce the transcription time
|
66 |
+
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" -F "streaming=true" -F "model=distil-large-v3" -F "language=en"
|
67 |
+
|
68 |
+
curl http://localhost:8000/v1/audio/translations -F "file=@audio.wav"
|
69 |
+
```
|
70 |
+
|
71 |
+
### Live Transcription
|
72 |
+
[websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required.
|
73 |
+
Live transcribing audio data from a microphone.
|
74 |
+
```bash
|
75 |
+
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions
|
76 |
```
|
77 |
Streaming audio data from a file.
|
78 |
```bash
|
79 |
+
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - > audio.raw
|
80 |
# send all data at once
|
81 |
+
cat audio.raw | websocat --no-close --binary ws://localhost:8000/v1/audio/transcriptions
|
82 |
# Output: {"text":"One,"}{"text":"One, two, three, four, five."}{"text":"One, two, three, four, five."}%
|
83 |
# streaming 16000 samples per second. each sample is 2 bytes
|
84 |
+
cat audio.raw | pv -qL 32000 | websocat --no-close --binary ws://localhost:8000/v1/audio/transcriptions
|
85 |
# Output: {"text":"One,"}{"text":"One, two,"}{"text":"One, two, three,"}{"text":"One, two, three, four, five."}{"text":"One, two, three, four, five. one."}%
|
86 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
audio.wav
ADDED
Binary file (209 kB). View file
|
|
faster_whisper_server/main.py
CHANGED
@@ -237,7 +237,6 @@ async def transcribe_stream(
|
|
237 |
ws: WebSocket,
|
238 |
model: Annotated[Model, Query()] = config.whisper.model,
|
239 |
language: Annotated[Language | None, Query()] = config.default_language,
|
240 |
-
prompt: Annotated[str | None, Query()] = None,
|
241 |
response_format: Annotated[
|
242 |
ResponseFormat, Query()
|
243 |
] = config.default_response_format,
|
@@ -246,7 +245,6 @@ async def transcribe_stream(
|
|
246 |
await ws.accept()
|
247 |
transcribe_opts = {
|
248 |
"language": language,
|
249 |
-
"initial_prompt": prompt,
|
250 |
"temperature": temperature,
|
251 |
"vad_filter": True,
|
252 |
"condition_on_previous_text": False,
|
|
|
237 |
ws: WebSocket,
|
238 |
model: Annotated[Model, Query()] = config.whisper.model,
|
239 |
language: Annotated[Language | None, Query()] = config.default_language,
|
|
|
240 |
response_format: Annotated[
|
241 |
ResponseFormat, Query()
|
242 |
] = config.default_response_format,
|
|
|
245 |
await ws.accept()
|
246 |
transcribe_opts = {
|
247 |
"language": language,
|
|
|
248 |
"temperature": temperature,
|
249 |
"vad_filter": True,
|
250 |
"condition_on_previous_text": False,
|