Transformers
Inference Endpoints

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Cool-Whisper

Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data

Liang-Hsuan Tseng, Zih-Ching Chen, Wei-Shun Chang, Cheng-Kuang Lee, Tsung-Ren Huang, Hung-yi Lee

arXiv Open In Colab

⚠️ Due to privacy and security concerns, this model will be temporarily taken offline. We are sorry for the inconvenience.

⚠️ 因為隱私安全疑慮,本模型將暫時下架。非常抱歉造成大家困擾。

Introduction

  • Cool-whisper is a distilled version of Whisper, mainly focused on Mandarin-English code-switching ASR for people in Taiwan.
  • We use 60,000 hours of unlabeled audio to train the model.
  • Practically, we utilize knowledge not only from the large model (Whisper-large-v2) but also from the small model (Whisper-base).

Basic Usage

from faster_whisper import WhisperModel
import soundfile as sf

model_card = "andybi7676/cool-whisper"

audio_fpath = "/your/path/to/audio.wav"
audio_info = sf.info(audio_fpath)
print(audio_info) # for debug

model = WhisperModel(model_card, device="cuda", compute_type="float16")

segments, info = model.transcribe(audio_fpath, beam_size=5, language="zh", condition_on_previous_text=True) # zh for zh-en code-switching in cool-whisper
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .