m-salman-a
/

wav2vec2-xlsr-53-common-voice-indonesian

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-xlsr-53-common-voice-indonesian / README.md

m-salman-a's picture

Update README.md

4ce6a39 almost 2 years ago

|

raw history blame contribute delete

No virus

1.55 kB

	---
	language: id
	datasets:
	- mozilla-foundation/common_voice_8_0
	metrics:
	- wer
	---

	# wav2vec 2.0 XLSR-53 Model

	This is the [wav2vec 2.0 XLSR-53 model](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) fine-tuned on the [Common Voice 8.0 datasets](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) for Bahasa Indonesia using the `train`, `validation`, and `other` splits (~32.000 sound samples). This model was used for research purposes to complete my Undergraduate Thesis.

	## Preprocessing
	1. Removal of symbols from transcript
	2. Convert numbers (0, 1, ..., 9) to word forms (satu, dua, ..., sembilan)
	3. Convert all characters to lowercase
	2. Resample the audio data to 16kHz.
	3. Uses data collator from [this example](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2)

	## Hyperparameters used
	- Learning rate = 1e-4
	- Maximum Epochs = 30
	- Batch size = 4 (limitations of compute resource)
	- Early stopping = Stop when WER doesn't improve for 2 validations
	- Other parameters use the defaults from [this config](https://huggingface.co/docs/transformers/v4.20.1/en/model_doc/wav2vec2#overview)

	## Results
	The results are an average of 5 runs using the `test` split from the Common Voice datasets for Bahasa Indonesia.

	Test Result: 15,6% WER

	## References
	- [Fine-tuning XLS-R for Multi-Lingual ASR with 🤗 Transformers](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2)
	- [Wav2Vec2-Large-XLSR-Indonesian by Indonesian NLP](https://huggingface.co/indonesian-nlp/wav2vec2-large-xlsr-indonesian-baseline)