wav2vec2-xls-r-300m-mixed

Finetuned https://huggingface.co/facebook/wav2vec2-xls-r-300m on https://github.com/huseinzol05/malaya-speech/tree/master/data/mixed-stt

This model was finetuned on 3 languages,

  1. Malay
  2. Singlish
  3. Mandarin

This model trained on a single RTX 3090 Ti 24GB VRAM, provided by https://mesolitica.com/.

Evaluation set

Evaluation set from https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt with sizes,

len(malay), len(singlish), len(mandarin)
-> (765, 3579, 614)

It achieves the following results on the evaluation set based on evaluate-gpu.ipynb:

Mixed evaluation,

CER: 0.0481054244857041
WER: 0.1322198446007387
CER with LM: 0.041196586938584696
WER with LM: 0.09880169127621556

Malay evaluation,

CER: 0.051636391937588406
WER: 0.19561999547293663
CER with LM: 0.03917689630621449
WER with LM: 0.12710746406824835

Singlish evaluation,

CER: 0.0494915200071987
WER: 0.12763802881676573
CER with LM: 0.04271234986432335
WER with LM: 0.09677160640413336

Mandarin evaluation,

CER: 0.035626554824269824
WER: 0.07993515937860181
CER with LM: 0.03487760945087219
WER with LM: 0.07536807168546154

Language model from https://huggingface.co/huseinzol05/language-model-bahasa-manglish-combined

Downloads last month
220
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.