Edit model card

wav2vec2-large-xlsr-common_voice_13_0-id

Note: do not recommended to try the model through this model card

Alternatively, try it through the available space click here Then you can addapt the inference method available in the gradio app script. Or you can checkout at my github repository click here

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the common_voice_13_0 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4115
  • Wer: 0.4316

Model description

The model is based on the facebook/wav2vec2-large-xlsr-53 architecture and fine-tuned for Automatic Speech Recognition on the common_voice_13_0 dataset in Indonesian (id). It is designed to transcribe spoken language into written text.

Intended uses & limitations

Intended Uses:

  • Automatic Speech Recognition for Indonesian speech data.
  • Transcription of spoken content in common_voice_13_0 dataset.

Limitations:

  • The model's performance may vary on speech data outside the common_voice_13_0 dataset.
  • It may not perform well on languages other than Indonesian.

Training and evaluation data

The model was trained on the common_voice_13_0 dataset, specifically using the Indonesian (id) split for evaluation.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
5.0656 2.88 400 2.7637 1.0
1.1404 5.76 800 0.4483 0.6088
0.3698 8.63 1200 0.4029 0.5278
0.2695 11.51 1600 0.3976 0.5036
0.2074 14.39 2000 0.3988 0.4793
0.1796 17.27 2400 0.3952 0.4590
0.1523 20.14 2800 0.3986 0.4463
0.1352 23.02 3200 0.4143 0.4374
0.121 25.9 3600 0.4022 0.4337
0.1085 28.78 4000 0.4115 0.4316

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
68
Safetensors
Model size
315M params
Tensor type
F32
ยท

Finetuned from

Space using arifagustyawan/wav2vec2-large-xlsr-53-id 1

Evaluation results