Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,59 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
language:
|
4 |
+
- th
|
5 |
+
base_model: biodatlab/whisper-th-small-combined
|
6 |
---
|
7 |
+
|
8 |
+
# Whisper-th-small-ct2
|
9 |
+
|
10 |
+
whisper-th-small-ct2 is the CTranslate2 format of [biodatlab/whisper-th-small-combined](https://huggingface.co/biodatlab/whisper-th-small-combined), comparable with [WhisperX](https://github.com/m-bain/whisperX) and [faster-whisper](https://github.com/SYSTRAN/faster-whisper), which enables:
|
11 |
+
|
12 |
+
- ⚡️ Batched inference for **70x** real-time transcription Whisper large-v2.
|
13 |
+
- 🪶 A faster-whisper backend, requiring **<8GB GPU memory** for large-v2 with beam_size=5.
|
14 |
+
- 🎯 Accurate word-level timestamps using wav2vec2 alignment.
|
15 |
+
- 👯♂️ Multispeaker ASR using speaker diarization(includes speaker ID labels).
|
16 |
+
- 🗣️ VAD preprocessing, reducing hallucinations and allowing batching with no WER degradation.
|
17 |
+
|
18 |
+
### Usage
|
19 |
+
|
20 |
+
```python
|
21 |
+
!pip install git+https://github.com/m-bain/whisperx.git
|
22 |
+
|
23 |
+
import whisperx
|
24 |
+
import time
|
25 |
+
|
26 |
+
# Setting
|
27 |
+
device = "cuda"
|
28 |
+
audio_file = "audio.mp3"
|
29 |
+
batch_size = 16
|
30 |
+
compute_type = "float16"
|
31 |
+
|
32 |
+
"""
|
33 |
+
Your Hugging Face token for the Diarization model is required.
|
34 |
+
Additionally, you need to accept the terms and conditions before use.
|
35 |
+
Please visit the model page here.
|
36 |
+
https://huggingface.co/pyannote/segmentation-3.0
|
37 |
+
"""
|
38 |
+
HF_TOKEN = ""
|
39 |
+
|
40 |
+
|
41 |
+
# load model and transcript
|
42 |
+
model = whisperx.load_model("Thaweewat/whisper-th-small-ct2", device, compute_type=compute_type)
|
43 |
+
st_time = time.time()
|
44 |
+
audio = whisperx.load_audio(audio_file)
|
45 |
+
result = model.transcribe(audio, batch_size=batch_size)
|
46 |
+
|
47 |
+
# Assign speaker labels
|
48 |
+
diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device)
|
49 |
+
diarize_segments = diarize_model(audio)
|
50 |
+
result = whisperx.assign_word_speakers(diarize_segments, result)
|
51 |
+
|
52 |
+
# Combine pure text if needed
|
53 |
+
combined_text = ' '.join(segment['text'] for segment in result['segments'])
|
54 |
+
|
55 |
+
print(f"Response time: {time.time() - st_time} seconds")
|
56 |
+
print(diarize_segments)
|
57 |
+
print(result)
|
58 |
+
print(combined_text)
|
59 |
+
```
|