This is the wav2vec 2.0 XLSR-53 model fine-tuned on the Common Voice 8.0 datasets for Bahasa Indonesia using the
other splits (~32.000 sound samples). This model was used for research purposes to complete my Undergraduate Thesis.
- Removal of symbols from transcript
- Convert numbers (0, 1, ..., 9) to word forms (satu, dua, ..., sembilan)
- Convert all characters to lowercase
- Resample the audio data to 16kHz.
- Uses data collator from this example
- Learning rate = 1e-4
- Maximum Epochs = 30
- Batch size = 4 (limitations of compute resource)
- Early stopping = Stop when WER doesn't improve for 2 validations
- Other parameters use the defaults from this config
The results are an average of 5 runs using the
test split from the Common Voice datasets for Bahasa Indonesia.
Test Result: 15,6% WER
- Downloads last month