Thaweewat
/

whisper-th-small-ct2

Inference Endpoints

Model card Files Files and versions Community

Thaweewat commited on Dec 26, 2023

Commit

1f9b30b

•

1 Parent(s): e53e015

Update README.md

Files changed (1) hide show

README.md +56 -0

README.md CHANGED Viewed

@@ -1,3 +1,59 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language:
+- th
+base_model: biodatlab/whisper-th-small-combined
 ---
+# Whisper-th-small-ct2
+whisper-th-small-ct2 is the CTranslate2 format of [biodatlab/whisper-th-small-combined](https://huggingface.co/biodatlab/whisper-th-small-combined), comparable with [WhisperX](https://github.com/m-bain/whisperX) and [faster-whisper](https://github.com/SYSTRAN/faster-whisper), which enables:
+- ⚡️ Batched inference for **70x** real-time transcription Whisper large-v2.
+- 🪶 A faster-whisper backend, requiring **<8GB GPU memory** for large-v2 with beam_size=5.
+- 🎯 Accurate word-level timestamps using wav2vec2 alignment.
+- 👯‍♂️ Multispeaker ASR using speaker diarization(includes speaker ID labels).
+- 🗣️ VAD preprocessing, reducing hallucinations and allowing batching with no WER degradation.
+### Usage
+```python
+!pip install git+https://github.com/m-bain/whisperx.git
+import whisperx
+import time
+# Setting
+device = "cuda"
+audio_file = "audio.mp3"
+batch_size = 16
+compute_type = "float16"
+"""
+Your Hugging Face token for the Diarization model is required.
+Additionally, you need to accept the terms and conditions before use.
+Please visit the model page here.
+https://huggingface.co/pyannote/segmentation-3.0
+"""
+HF_TOKEN = ""
+# load model and transcript
+model = whisperx.load_model("Thaweewat/whisper-th-small-ct2", device, compute_type=compute_type)
+st_time = time.time()
+audio = whisperx.load_audio(audio_file)
+result = model.transcribe(audio, batch_size=batch_size)
+# Assign speaker labels
+diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device)
+diarize_segments = diarize_model(audio)
+result = whisperx.assign_word_speakers(diarize_segments, result)
+# Combine pure text if needed
+combined_text = ' '.join(segment['text'] for segment in result['segments'])
+print(f"Response time: {time.time() - st_time} seconds")
+print(diarize_segments)
+print(result)
+print(combined_text)
+```