kotoba-tech
/

kotoba-whisper-bilingual-v1.0-faster

Automatic Speech Recognition

Model card Files Files and versions Community

asahi417 commited on Sep 29, 2024

Commit

10d660a

·

verified ·

1 Parent(s): aa89fea

Update README.md

Files changed (1) hide show

README.md +24 -2

README.md CHANGED Viewed

@@ -18,8 +18,13 @@ This model can be used in CTranslate2 or projects based on CTranslate2 such as [
 Install library and download sample audio.
 ```shell
 pip install faster-whisper
-wget https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/resolve/main/sample_ja_speech.wav
 ```
 Inference with the kotoba-whisper-bilingual-v1.0-faster.
 ```python
@@ -27,9 +32,26 @@ from faster_whisper import WhisperModel
 model = WhisperModel("kotoba-tech/kotoba-whisper-bilingual-v1.0-faster")
-segments, info = model.transcribe("sample_ja_speech.wav", language="ja", chunk_length=15, condition_on_previous_text=False)
 for segment in segments:
     print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
 ```
 ### Benchmark

 Install library and download sample audio.
 ```shell
 pip install faster-whisper
+wget https://huggingface.co/datasets/japanese-asr/en_asr.esb_eval/resolve/main/sample.wav -O sample_en.wav
+wget https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000/resolve/main/sample.flac -O sample_ja.flac
+ffmpeg -i sample_en.wav -ar 16000 -ac 1 -c:a pcm_s16le sample_en_fixed.wav
+ffmpeg -i sample_ja.flac -ar 16000 -ac 1 -c:a pcm_s16le sample_ja_fixed.wav
 ```
 Inference with the kotoba-whisper-bilingual-v1.0-faster.
 ```python
 model = WhisperModel("kotoba-tech/kotoba-whisper-bilingual-v1.0-faster")
+# Japanese ASR
+segments, info = model.transcribe("sample_ja.flac", language="ja", task="transcribe", chunk_length=15, condition_on_previous_text=False)
+for segment in segments:
+    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
+# English ASR
+segments, info = model.transcribe("sample_en_fixed.wav", language="en", task="transcribe", chunk_length=15, condition_on_previous_text=False)
+for segment in segments:
+    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
+# Japanese (speech) to English (text) Translation
+segments, info = model.transcribe("sample_ja.flac", language="en", task="translate", chunk_length=15, condition_on_previous_text=False)
+for segment in segments:
+    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
+# English (speech) to Japanese (text) Translation
+segments, info = model.transcribe("sample_en.wav", language="ja", task="translate", chunk_length=15, condition_on_previous_text=False)
 for segment in segments:
     print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
 ```
 ### Benchmark