Edit model card


Model description

This ASR model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on Turkish language.

Training and evaluation data

The following datasets were used for finetuning:

Training procedure

To support both of the datasets above, custom pre-processing and loading steps was performed and wav2vec2-turkish repo was used for that purpose.

Training hyperparameters

The following hypermaters were used for finetuning:

  • learning_rate 2e-4
  • num_train_epochs 10
  • warmup_steps 500
  • freeze_feature_extractor
  • mask_time_prob 0.1
  • mask_feature_prob 0.1
  • feat_proj_dropout 0.05
  • attention_dropout 0.05
  • final_dropout 0.1
  • activation_dropout 0.05
  • per_device_train_batch_size 8
  • per_device_eval_batch_size 8
  • gradient_accumulation_steps 8

Framework versions

  • Transformers 4.17.0.dev0
  • Pytorch 1.10.1
  • Datasets 1.18.3
  • Tokenizers 0.10.3

Language Model

N-gram language model is trained on a Turkish Wikipedia articles using KenLM and ngram-lm-wiki repo was used to generate arpa LM and convert it into binary format.

Evaluation Commands

Please install unicode_tr package before running evaluation. It is used for Turkish text processing.

  1. To evaluate on common_voice with split test
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv6-turkish --dataset common_voice --config tr --split test
  1. To evaluate on speech-recognition-community-v2/dev_data
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv6-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0

Evaluation results:

Dataset WER CER
Common Voice 6.1 TR test split 8.83 2.37
Speech Recognition Community dev data 32.81 11.22
Downloads last month
Hosted inference API
or or
This model can be loaded on the Inference API on-demand.

Dataset used to train mpoyraz/wav2vec2-xls-r-300m-cv6-turkish

Evaluation results