languages
#4
by
ChandlerGIS
- opened
How to use other languages?
Tested japanese and hindi, seems it supports English characters only for now according to here https://github.com/huggingface/distil-whisper/issues/6#issuecomment-1790869897
I am posting here a Japanese how to for future use if anyone needs it (not working for now, but will work for update probably)
Test Japanese audio and text:
https://kaiidams.github.io/Kokoro-Speech-Dataset/tacotron.html
with open("kokoro-ja-tacotron-sample5.txt","r",encoding="utf-8") as f:
text=f.read()
print("Original Text:", text)
audio_data= "kokoro-ja-tacotron-sample5.wav"
x, sample_rate = librosa.load(audio_data)
# 1. Pre-process the audio data to log-mel spectrogram inputs
input_features = processor(x, sampling_rate=16000, return_tensors="pt").input_features
input_features = input_features.to(device, dtype=torch_dtype)
# 2. Auto-regressively generate the predicted token ids
pred_ids = model.generate(input_features, max_new_tokens=128, language="ja", task="transcribe")
# 3. Decode the token ids to the final transcription
result = processor.batch_decode(pred_ids, skip_special_tokens=True)
print("distil-whisper result:", result)
Original Text: 中山 さま と いう おと の さま が 、 おら れ た そう です 。
distil-whisper result: [' Na-sama to-same the to-sama to the toe to the toe.']
#This is faster whisper result
Detected language 'ja' with probability 0.978516
faster whisper small result: {'text': '中、山様というとの様が、おられたそうです。', 'segments': [(0.0, 5.2, '中、山様というとの様が、おられたそうです。')]}
distil-large-v2 is English only for now - soon (next week) we will release training code that will allow you to train/distil Whisper in your language of choice!
ChandlerGIS
changed discussion status to
closed