--- language: ja datasets: - common_voice metrics: - cer model-index: - name: wav2vec2-xls-r-300m finetuned on Japanese Hiragana with no word boundaries results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice Japanese type: common_voice args: ja metrics: - name: Test CER type: cer value: 9.34 --- # Wav2Vec2-XLS-R-300M-Japanese-Hiragana Fine-tuned [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on Japanese Hiragana characters using JSUT, JVS, Common Voice, and in-house dataset. The sentence outputs do not contain word boundaries. Audio inputs should be sampled at 16kHz. ## Test Results **CER:** 9.34% ## Training Trained on JSUT, a subset of JVS, train+valid set of Common Voice Japanese, and in-house Japanese dataset. Tested on test set of Common Voice Japanese.