distil-whisper/distil-large-v2

ChandlerGIS

Nov 3, 2023

How to use other languages？

gorkemgoknar

Nov 3, 2023

•

edited Nov 3, 2023

Tested japanese and hindi, seems it supports English characters only for now according to here https://github.com/huggingface/distil-whisper/issues/6#issuecomment-1790869897

I am posting here a Japanese how to for future use if anyone needs it (not working for now, but will work for update probably)

Test Japanese audio and text:
https://kaiidams.github.io/Kokoro-Speech-Dataset/tacotron.html

with open("kokoro-ja-tacotron-sample5.txt","r",encoding="utf-8") as f:
    text=f.read()    

print("Original Text:", text)

audio_data= "kokoro-ja-tacotron-sample5.wav"
x, sample_rate = librosa.load(audio_data)
# 1. Pre-process the audio data to log-mel spectrogram inputs
input_features = processor(x, sampling_rate=16000, return_tensors="pt").input_features
input_features = input_features.to(device, dtype=torch_dtype)

# 2. Auto-regressively generate the predicted token ids
pred_ids = model.generate(input_features, max_new_tokens=128, language="ja", task="transcribe")

# 3. Decode the token ids to the final transcription
result = processor.batch_decode(pred_ids, skip_special_tokens=True)
print("distil-whisper result:", result)


Original Text: 中山 さま と いう おと の さま が 、 おら れ た そう です 。

distil-whisper result: [' Na-sama to-same the to-sama to the toe to the toe.']

#This is faster whisper result
Detected language 'ja' with probability 0.978516
faster whisper small result: {'text': '中、山様というとの様が、おられたそうです。', 'segments': [(0.0, 5.2, '中、山様というとの様が、おられたそうです。')]}

patrickvonplaten

Whisper Distillation org Nov 3, 2023

distil-large-v2 is English only for now - soon (next week) we will release training code that will allow you to train/distil Whisper in your language of choice!

ChandlerGIS changed discussion status to closed Nov 8, 2023