metadata

title: Urdu ASR SOTA
emoji: 👨‍🎤
colorFrom: green
colorTo: blue
sdk: gradio
app_file: Gradio/app.py
pinned: true
license: apache-2.0

Urdu Automatic Speech Recognition State of the Art Solution

Automatic Speech Recognition using Facebook's wav2vec2-xls-r-300m model and mozilla-foundation common_voice_8_0 Urdu Dataset.

Model Finetunning

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset.

It achieves the following results on the evaluation set:

Loss: 0.9889
Wer: 0.5607
Cer: 0.2370

Quick Prediction

Install all dependecies using requirment.txt file and then run bellow command to predict the text:

import torch
from datasets import load_dataset, Audio
from transformers import pipeline
model = "Model"
data = load_dataset("Data", "ur", split="test", delimiter="\t")
def path_adjust(batch):
    batch["path"] = "Data/ur/clips/" + str(batch["path"])
    return batch
data = data.map(path_adjust)
sample_iter = iter(data.cast_column("path", Audio(sampling_rate=16_000)))
sample = next(sample_iter)

asr = pipeline("automatic-speech-recognition", model=model)
prediction = asr(
            sample["path"]["array"], chunk_length_s=5, stride_length_s=1)
prediction
# => {'text': 'اب یہ ونگین لمحاتانکھار دلمیں میںفوث کریلیا اجائ'}

Evaluation Commands

To evaluate on mozilla-foundation/common_voice_8_0 with split test, you can copy and past the command to the terminal.

python3 eval.py --model_id Model --dataset Data --config ur --split test --chunk_length_s 5.0 --stride_length_s 1.0 --log_outputs

OR Run the simple shell script

bash run_eval.sh

Language Model

Boosting Wav2Vec2 with n-grams in 🤗 Transformers

Get suitable Urdu text data for a language model
Build an n-gram with KenLM
Combine the n-gram with a fine-tuned Wav2Vec2 checkpoint

Install kenlm and pyctcdecode before running the notebook.

pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode

Eval Results

Without LM	With LM
56.21	46.37

Directory Structure

<root directory>
    |
    .- README.md
    |
    .- Data/
    |
    .- Model/
    |
    .- Images/
    |
    .- Sample/
    |
    .- Gradio/
    |
    .- Eval Results/
          |
          .- With LM/
          |
          .- Without LM/
          | ...
    .- notebook.ipynb
    |
    .- run_eval.sh
    |
    .- eval.py

Gradio App

SOTA

Add Language Model
Webapp/API
[] Denoise Audio
[] Text Processing
[] Spelling Mistakes
Hyperparameters optimization
[] Training on 300 Epochs & 64 Batch Size
[] Improved Language Model
[] Contribute to Urdu ASR Audio Dataset

Robust Speech Recognition Challenge 2022

This project was the results of HuggingFace Robust Speech Recognition Challenge. I was one of the winner with four state of the art ASR model. Check out my SOTA checkpoints.

Urdu
Arabic
Punjabi
Irish

Spaces:

kingabzpro
/

Urdu-ASR-SOTA

Running