Dauren-Nur commited on
Commit
81bc62e
1 Parent(s): b5127db

Create README.md

Browse files

Model created by ISSAI. Not official upload.
Mussakhojayeva, S.; Dauletbek, K.; Yeshpanov, R.; Varol, H.A. Multilingual Speech Recognition for Turkic Languages. Information 2023, 14, 74. (https://doi.org/10.3390/info14020074)

At ISSAI, we have previously developed automatic speech recognition systems for the Kazakh language. Now, leveraging our advances in Kazakh ASR, we have extended our work to a multilingual ASR model that can recognize ten Turkic languages—Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek.

The multilingual models that were trained using joint speech data performed more robustly than the baseline monolingual models, with the best model achieving an average character and word error rate reduction of 56% and 54%, respectively.

The results of the experiments demonstrated that character and word error rate reduction was more likely when multilingual models were trained with data from related Turkic languages than when they were developed using data from unrelated, non-Turkic languages, such as English and Russian.

The study also presented an open-source Turkish speech corpus. The corpus contains 218.2 hours of transcribed speech with 186,171 utterances and is the largest publicly available Turkish dataset of its kind. The datasets and codes used to train the models are available for download at https://github.com/IS2AI/TurkicASR.

Files changed (1) hide show
  1. README.md +18 -0
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - Shirali/ISSAI_KSC_335RS_v_1_1
4
+ language:
5
+ - kk
6
+ - en
7
+ - ru
8
+ - tr
9
+ - ky
10
+ - uz
11
+ - ug
12
+ - tt
13
+ - az
14
+ - ba
15
+ metrics:
16
+ - wer
17
+ pipeline_tag: automatic-speech-recognition
18
+ ---