---
language: ja
datasets:
- common_voice
metrics:
- cer
model-index:
- name: wav2vec2-xls-r-300m finetuned on Japanese Hiragana with no word boundaries
  results:
  - task:
      name: Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice Japanese
      type: common_voice
      args: ja
    metrics:
       - name: Test CER
         type: cer
         value: 9.34
---
# Wav2Vec2-XLS-R-300M-Japanese-Hiragana
Fine-tuned [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on Japanese Hiragana characters using JSUT, JVS, Common Voice, and in-house dataset.
The sentence outputs do not contain word boundaries. Audio inputs should be sampled at 16kHz.

## Test Results
**CER:** 9.34%
## Training
Trained on JSUT, a subset of JVS, train+valid set of Common Voice Japanese, and in-house Japanese dataset. Tested on test set of Common Voice Japanese.