wav2vec 2.0 XLSR-53 Model

This is the wav2vec 2.0 XLSR-53 model fine-tuned on the Common Voice 8.0 datasets for Bahasa Indonesia using the train, validation, and other splits (~32.000 sound samples). This model was used for research purposes to complete my Undergraduate Thesis.

Preprocessing

Removal of symbols from transcript
Convert numbers (0, 1, ..., 9) to word forms (satu, dua, ..., sembilan)
Convert all characters to lowercase
Resample the audio data to 16kHz.
Uses data collator from this example

Hyperparameters used

Learning rate = 1e-4
Maximum Epochs = 30
Batch size = 4 (limitations of compute resource)
Early stopping = Stop when WER doesn't improve for 2 validations
Other parameters use the defaults from this config

Results

The results are an average of 5 runs using the test split from the Common Voice datasets for Bahasa Indonesia.

Test Result: 15,6% WER

References

Fine-tuning XLS-R for Multi-Lingual ASR with 🤗 Transformers
Wav2Vec2-Large-XLSR-Indonesian by Indonesian NLP

m-salman-a
/

wav2vec2-xlsr-53-common-voice-indonesian

wav2vec 2.0 XLSR-53 Model

Preprocessing

Hyperparameters used

Results

References

Dataset used to train m-salman-a/wav2vec2-xlsr-53-common-voice-indonesian