kotoba-tech
/

kotoba-whisper-bilingual-v1.0-faster

Automatic Speech Recognition

Model card Files Files and versions Community

asahi417 commited on Sep 29, 2024

Commit

3b597b8

·

verified ·

1 Parent(s): 8f3f575

Update README.md

Files changed (1) hide show

README.md +2 -5

README.md CHANGED Viewed

@@ -20,9 +20,6 @@ Install library and download sample audio.
 pip install faster-whisper
 wget https://huggingface.co/datasets/japanese-asr/en_asr.esb_eval/resolve/main/sample.wav -O sample_en.wav
 wget https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000/resolve/main/sample.flac -O sample_ja.flac
-ffmpeg -i sample_en.wav -ar 16000 -ac 1 -c:a pcm_s16le sample_en_fixed.wav
-ffmpeg -i sample_ja.flac -ar 16000 -ac 1 -c:a pcm_s16le sample_ja_fixed.wav
 ```
 Inference with the kotoba-whisper-bilingual-v1.0-faster.
@@ -33,12 +30,12 @@ from faster_whisper import WhisperModel
 model = WhisperModel("kotoba-tech/kotoba-whisper-bilingual-v1.0-faster")
 # Japanese ASR
-segments, info = model.transcribe("sample_ja.flac", language="ja", task="transcribe", chunk_length=15, condition_on_previous_text=False)
 for segment in segments:
     print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
 # English ASR
-segments, info = model.transcribe("sample_en_fixed.wav", language="en", task="transcribe", chunk_length=15, condition_on_previous_text=False)
 for segment in segments:
     print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

 pip install faster-whisper
 wget https://huggingface.co/datasets/japanese-asr/en_asr.esb_eval/resolve/main/sample.wav -O sample_en.wav
 wget https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000/resolve/main/sample.flac -O sample_ja.flac
 ```
 Inference with the kotoba-whisper-bilingual-v1.0-faster.
 model = WhisperModel("kotoba-tech/kotoba-whisper-bilingual-v1.0-faster")
 # Japanese ASR
+segments, info = model.transcribe("sample_ja.flac", language="ja", task="transcribe", condition_on_previous_text=False)
 for segment in segments:
     print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
 # English ASR
+segments, info = model.transcribe("sample_en.wav", language="en", task="transcribe", chunk_length=15, condition_on_previous_text=False)
 for segment in segments:
     print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))