whisper-th-large-v3-ct2

CTranslate2 (int8) conversion of biodatlab/whisper-th-large-v3-combined (Thonburian Whisper) for fast CPU/GPU inference with faster-whisper.

Built for and used by LyricBridge β€” an open-source karaoke maker that removes vocals and generates word-synced Thai lyrics.

  • Architecture: Whisper large-v3 (128 mel bins).
  • Format: CTranslate2, quantized int8 (CPU); use int8_float16 on GPU.
  • Language: Thai (th).

How to use (faster-whisper)

from faster_whisper import WhisperModel

# Pin the revision so results stay reproducible over time.
model = WhisperModel(
    "Avocaduu14/whisper-th-large-v3-ct2",
    revision="1a1554ea606d89c937216ada609bb8585e20a36e",
    device="cpu",            # or "cuda"
    compute_type="int8",     # CPU; use "int8_float16" on GPU
)
segments, info = model.transcribe("audio.wav", language="th")
for seg in segments:
    print(seg.start, seg.end, seg.text)

Source model & attribution

This repository is a format conversion only (CTranslate2 / int8). No weights were retrained β€” only the storage/compute format changed. All modeling credit belongs to the original authors.

  • Original model: biodatlab/whisper-th-large-v3-combined (Thonburian Whisper)
  • Authors: Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut β€” Biomedical and Data Lab, Mahidol University
  • Base model: openai/whisper-large-v3
  • License: Apache-2.0 (same as the source; retained here)
  • Reported quality: WER 6.59 on Common Voice 13 (th) test set (from the source model card)

Citation

@misc{thonburian_whisper_med,
  author    = {Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut},
  title     = {Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition},
  year      = {2022},
  url        = {https://huggingface.co/biodatlab/whisper-th-medium-combined},
  doi       = {10.57967/hf/0226},
  publisher = {Hugging Face}
}

Conversion

Converted with CTranslate2's ct2-transformers-converter (int8). Whisper large-v3 architecture (128 mel bins), so use a runtime that supports large-v3 feature extraction.

Credits

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Avocaduu14/whisper-th-large-v3-ct2

Finetuned
(3)
this model