Thaweewat commited on
Commit
1f9b30b
1 Parent(s): e53e015

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -1,3 +1,59 @@
1
  ---
2
  license: mit
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - th
5
+ base_model: biodatlab/whisper-th-small-combined
6
  ---
7
+
8
+ # Whisper-th-small-ct2
9
+
10
+ whisper-th-small-ct2 is the CTranslate2 format of [biodatlab/whisper-th-small-combined](https://huggingface.co/biodatlab/whisper-th-small-combined), comparable with [WhisperX](https://github.com/m-bain/whisperX) and [faster-whisper](https://github.com/SYSTRAN/faster-whisper), which enables:
11
+
12
+ - ⚡️ Batched inference for **70x** real-time transcription Whisper large-v2.
13
+ - 🪶 A faster-whisper backend, requiring **<8GB GPU memory** for large-v2 with beam_size=5.
14
+ - 🎯 Accurate word-level timestamps using wav2vec2 alignment.
15
+ - 👯‍♂️ Multispeaker ASR using speaker diarization(includes speaker ID labels).
16
+ - 🗣️ VAD preprocessing, reducing hallucinations and allowing batching with no WER degradation.
17
+
18
+ ### Usage
19
+
20
+ ```python
21
+ !pip install git+https://github.com/m-bain/whisperx.git
22
+
23
+ import whisperx
24
+ import time
25
+
26
+ # Setting
27
+ device = "cuda"
28
+ audio_file = "audio.mp3"
29
+ batch_size = 16
30
+ compute_type = "float16"
31
+
32
+ """
33
+ Your Hugging Face token for the Diarization model is required.
34
+ Additionally, you need to accept the terms and conditions before use.
35
+ Please visit the model page here.
36
+ https://huggingface.co/pyannote/segmentation-3.0
37
+ """
38
+ HF_TOKEN = ""
39
+
40
+
41
+ # load model and transcript
42
+ model = whisperx.load_model("Thaweewat/whisper-th-small-ct2", device, compute_type=compute_type)
43
+ st_time = time.time()
44
+ audio = whisperx.load_audio(audio_file)
45
+ result = model.transcribe(audio, batch_size=batch_size)
46
+
47
+ # Assign speaker labels
48
+ diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device)
49
+ diarize_segments = diarize_model(audio)
50
+ result = whisperx.assign_word_speakers(diarize_segments, result)
51
+
52
+ # Combine pure text if needed
53
+ combined_text = ' '.join(segment['text'] for segment in result['segments'])
54
+
55
+ print(f"Response time: {time.time() - st_time} seconds")
56
+ print(diarize_segments)
57
+ print(result)
58
+ print(combined_text)
59
+ ```