Edit model card


Fine-tuned facebook/wav2vec2-xls-r-300m on Japanese Hiragana characters using JSUT, JVS, Common Voice, and in-house dataset. The sentence outputs do not contain word boundaries. Audio inputs should be sampled at 16kHz.

Test Results

CER: 9.34%


Trained on JSUT, a subset of JVS, train+valid set of Common Voice Japanese, and in-house Japanese dataset. Tested on test set of Common Voice Japanese.

Downloads last month

Dataset used to train snu-nia-12/wav2vec2-xls-r-300m_nia12_phone-hiragana_japanese

Evaluation results