Edit model card

wav2vec2-xls-r-300m-cv7-turkish

Model description

This ASR model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on Turkish language.

Training and evaluation data

The following datasets were used for finetuning:

Training procedure

To support both of the datasets above, custom pre-processing and loading steps was performed and wav2vec2-turkish repo was used for that purpose.

Training hyperparameters

The following hypermaters were used for finetuning:

  • learning_rate 2e-4
  • num_train_epochs 10
  • warmup_steps 500
  • freeze_feature_extractor
  • mask_time_prob 0.1
  • mask_feature_prob 0.05
  • feat_proj_dropout 0.05
  • attention_dropout 0.05
  • final_dropout 0.05
  • activation_dropout 0.05
  • per_device_train_batch_size 8
  • per_device_eval_batch_size 8
  • gradient_accumulation_steps 8

Framework versions

  • Transformers 4.16.0.dev0
  • Pytorch 1.10.1
  • Datasets 1.17.0
  • Tokenizers 0.10.3

Language Model

N-gram language model is trained on a Turkish Wikipedia articles using KenLM and ngram-lm-wiki repo was used to generate arpa LM and convert it into binary format.

Evaluation Commands

Please install unicode_tr package before running evaluation. It is used for Turkish text processing.

  1. To evaluate on mozilla-foundation/common_voice_7_0 with split test
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv7-turkish --dataset mozilla-foundation/common_voice_7_0 --config tr --split test
  1. To evaluate on speech-recognition-community-v2/dev_data
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv7-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0

Evaluation results:

Dataset WER CER
Common Voice 7 TR test split 8.62 2.26
Speech Recognition Community dev data 30.87 10.69
Downloads last month
214,526
Inference API
or
This model can be loaded on Inference API (serverless).

Dataset used to train mpoyraz/wav2vec2-xls-r-300m-cv7-turkish

Evaluation results