whisper-th-large-v3-ct2

CTranslate2 (int8) conversion of biodatlab/whisper-th-large-v3-combined (Thonburian Whisper) for fast CPU/GPU inference with faster-whisper.

Built for and used by LyricBridge — an open-source karaoke maker that removes vocals and generates word-synced Thai lyrics.

Architecture: Whisper large-v3 (128 mel bins).
Format: CTranslate2, quantized int8 (CPU); use int8_float16 on GPU.
Language: Thai (th).

How to use (faster-whisper)

from faster_whisper import WhisperModel

# Pin the revision so results stay reproducible over time.
model = WhisperModel(
    "Avocaduu14/whisper-th-large-v3-ct2",
    revision="1a1554ea606d89c937216ada609bb8585e20a36e",
    device="cpu",            # or "cuda"
    compute_type="int8",     # CPU; use "int8_float16" on GPU
)
segments, info = model.transcribe("audio.wav", language="th")
for seg in segments:
    print(seg.start, seg.end, seg.text)

Source model & attribution

This repository is a format conversion only (CTranslate2 / int8). No weights were retrained — only the storage/compute format changed. All modeling credit belongs to the original authors.

Original model: biodatlab/whisper-th-large-v3-combined (Thonburian Whisper)
Authors: Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut — Biomedical and Data Lab, Mahidol University
Base model: openai/whisper-large-v3
License: Apache-2.0 (same as the source; retained here)
Reported quality: WER 6.59 on Common Voice 13 (th) test set (from the source model card)

Citation

@misc{thonburian_whisper_med,
  author    = {Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut},
  title     = {Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition},
  year      = {2022},
  url        = {https://huggingface.co/biodatlab/whisper-th-medium-combined},
  doi       = {10.57967/hf/0226},
  publisher = {Hugging Face}
}

Conversion

Converted with CTranslate2's ct2-transformers-converter (int8). Whisper large-v3 architecture (128 mel bins), so use a runtime that supports large-v3 feature extraction.

Credits

biodatlab / Thonburian Whisper — the Thai-finetuned model this repo converts.
OpenAI Whisper — base architecture.
faster-whisper / CTranslate2 — inference runtime.
LyricBridge — downstream project (MIT).

Downloads last month: 22

Model tree for Avocaduu14/whisper-th-large-v3-ct2

Base model

openai/whisper-large-v3

Finetuned

biodatlab/whisper-th-large-v3-combined

Finetuned

(3)

this model