metadata

language: ja
datasets:
  - common_voice
metrics:
  - cer
model-index:
  - name: wav2vec2-xls-r-300m finetuned on Japanese Hiragana with no word boundaries
    results:
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Japanese
          type: common_voice
          args: ja
        metrics:
          - name: Test CER
            type: cer
            value: 9.34

Wav2Vec2-XLS-R-300M-Japanese-Hiragana

Fine-tuned facebook/wav2vec2-xls-r-300m on Japanese Hiragana characters using JSUT, JVS, Common Voice, and in-house dataset. The sentence outputs do not contain word boundaries. Audio inputs should be sampled at 16kHz.

Test Results

CER: 9.34%

Training

Trained on JSUT, a subset of JVS, train+valid set of Common Voice Japanese, and in-house Japanese dataset. Tested on test set of Common Voice Japanese.