Wav2Vec2-XLS-R-300M-Japanese-Hiragana

Fine-tuned facebook/wav2vec2-xls-r-300m on Japanese Hiragana characters using JSUT, JVS, Common Voice, and in-house dataset. The sentence outputs do not contain word boundaries. Audio inputs should be sampled at 16kHz.

Test Results

CER: 9.34%

Training

Trained on JSUT, a subset of JVS, train+valid set of Common Voice Japanese, and in-house Japanese dataset. Tested on test set of Common Voice Japanese.

Downloads last month: 83

Inference Providers NEW

Automatic Speech Recognition

This model is not currently available via any of the supported Inference Providers.

Dataset used to train snu-nia-12/wav2vec2-xls-r-300m_nia12_phone-hiragana_japanese

Evaluation results

Test CER on Common Voice Japanese
self-reported

9.340

View on Papers With Code