--- title: README emoji: 🐨 colorFrom: blue colorTo: yellow sdk: static pinned: false --- # Japanese ASR This repository contains all the models and datasets for train/evaluate the Japanese ASR dataset generated through the process of achieving [kotoba-whisper models](https://huggingface.co/collections/kotoba-tech/kotoba-whisper-661d04846a2892cc27a23921). Following table shows CER comparison with different data size of ReazonSpeech used to distill [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3). The model names follows `japanese-asr/distil-whisper-large-v3-ja-reazonspeech-{size of reazonspeech}`. ***CER*** | model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) | |:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:| | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all) | 9.2 | 8.4 | 11.6 | | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large) | 9.4 | 8.5 | 12.2 | | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium) | 10.9 | 11.3 | 14.8 | | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small) | 30.2 | 39 | 40.7 | | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny) | 94.8 | 96.3 | 96.7 | | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 8.5 | 7.1 | 14.9 | | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 9.7 | 8.2 | 28.1 | | [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 10 | 8.9 | 34.1 | | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 11.5 | 10 | 33.2 | | [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 28.6 | 24.9 | 70.4 | | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 15.1 | 14.2 | 41.5 | | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 53.7 | 36.5 | 137.9 | | [reazon-research/reazonspeech-nemo-v2](https://huggingface.co/reazon-research/reazonspeech-nemo-v2) | 9.1 | 7.4 | 11.2 | ***WER*** | model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) | |:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:| | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all) | 58.8 | 63.7 | 55.6 | | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large) | 59.2 | 64.3 | 56.4 | | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium) | 64.6 | 72.1 | 63 | | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small) | 85 | 94.2 | 82.1 | | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny) | 100 | 100 | 99 | | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 55.1 | 59.2 | 60.2 | | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 59.3 | 63.2 | 74.1 | | [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 61.1 | 66.4 | 74.9 | | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 63.4 | 69.5 | 76 | | [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 87.2 | 93 | 91.8 | | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 74.2 | 81.9 | 83 | | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 93.8 | 97.6 | 94.9 | | [reazon-research/reazonspeech-nemo-v2](https://huggingface.co/reazon-research/reazonspeech-nemo-v2) | 57.5 | 60.6 | 47.5 | Note that [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) is an alias of [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large) and [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) is an alias of [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all). Please find more detailed results at [kotoba-whisper codebase](https://github.com/kotoba-tech/kotoba-whisper).