huseinzol05's picture
Update README.md
9204f13
metadata
tags:
  - generated_from_keras_callback
model-index:
  - name: wav2vec2-xls-r-300m-mixed
    results: []

wav2vec2-xls-r-300m-mixed

Finetuned https://huggingface.co/facebook/wav2vec2-xls-r-300m on https://github.com/huseinzol05/malaya-speech/tree/master/data/mixed-stt

This model was finetuned on 3 languages,

  1. Malay
  2. Singlish
  3. Mandarin

This model trained on a single RTX 3090 Ti 24GB VRAM, provided by https://mesolitica.com/.

Evaluation set

Evaluation set from https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt with sizes,

len(malay), len(singlish), len(mandarin)
-> (765, 3579, 614)

It achieves the following results on the evaluation set based on evaluate-wav2vec2-xls-r-300m-mixed.ipynb:

Mixed evaluation,

CER: 0.04363189219453221
WER: 0.12446419219809059
CER with LM: 0.03621180629932558
WER with LM: 0.09152993800218129

Malay evaluation,

CER: 0.053659683623049854
WER: 0.22565751242221832
CER with LM: 0.036930421149001316
WER with LM: 0.14256712242006359

Singlish evaluation,

CER: 0.04174804195104746
WER: 0.10734402150682842
CER with LM: 0.03538238462620066
WER with LM: 0.08103191123663189

Mandarin evaluation,

CER: 0.04211892733885779
WER: 0.09817787449869257
CER with LM: 0.040151154521006656
WER with LM: 0.08913415903511501

Language model from https://huggingface.co/huseinzol05/language-model-bahasa-manglish-combined