snu-nia-12's picture
Create README.md
dd84e33
---
language: ja
datasets:
- common_voice
metrics:
- cer
model-index:
- name: wav2vec2-xls-r-300m finetuned on Japanese Hiragana with no word boundaries
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice Japanese
type: common_voice
args: ja
metrics:
- name: Test CER
type: cer
value: 9.34
---
# Wav2Vec2-XLS-R-300M-Japanese-Hiragana
Fine-tuned [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on Japanese Hiragana characters using JSUT, JVS, Common Voice, and in-house dataset.
The sentence outputs do not contain word boundaries. Audio inputs should be sampled at 16kHz.
## Test Results
**CER:** 9.34%
## Training
Trained on JSUT, a subset of JVS, train+valid set of Common Voice Japanese, and in-house Japanese dataset. Tested on test set of Common Voice Japanese.