wav2vec2-large-xlsr-common_voice_13_0-id

Note: do not recommended to try the model through this model card

Alternatively, try it through the available space click here Then you can addapt the inference method available in the gradio app script. Or you can checkout at my github repository click here

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the common_voice_13_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.4115
Wer: 0.4316

Model description

The model is based on the facebook/wav2vec2-large-xlsr-53 architecture and fine-tuned for Automatic Speech Recognition on the common_voice_13_0 dataset in Indonesian (id). It is designed to transcribe spoken language into written text.

Intended uses & limitations

Intended Uses:

Automatic Speech Recognition for Indonesian speech data.
Transcription of spoken content in common_voice_13_0 dataset.

Limitations:

The model's performance may vary on speech data outside the common_voice_13_0 dataset.
It may not perform well on languages other than Indonesian.

Training and evaluation data

The model was trained on the common_voice_13_0 dataset, specifically using the Indonesian (id) split for evaluation.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
5.0656	2.88	400	2.7637	1.0
1.1404	5.76	800	0.4483	0.6088
0.3698	8.63	1200	0.4029	0.5278
0.2695	11.51	1600	0.3976	0.5036
0.2074	14.39	2000	0.3988	0.4793
0.1796	17.27	2400	0.3952	0.4590
0.1523	20.14	2800	0.3986	0.4463
0.1352	23.02	3200	0.4143	0.4374
0.121	25.9	3600	0.4022	0.4337
0.1085	28.78	4000	0.4115	0.4316

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu118
Datasets 2.15.0
Tokenizers 0.15.0

arifagustyawan
/

wav2vec2-large-xlsr-53-id

wav2vec2-large-xlsr-common_voice_13_0-id

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Space using arifagustyawan/wav2vec2-large-xlsr-53-id 1

Evaluation results

wav2vec2-large-xlsr-common_voice_13_0-id

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from facebook/wav2vec2-large-xlsr-53

Space using arifagustyawan/wav2vec2-large-xlsr-53-id 1

Evaluation results

Finetuned from