Spaces:
Sleeping
title: Urdu ASR SOTA
emoji: 👨🎤
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 2.8.12
app_file: Gradio/app.py
pinned: false
license: apache-2.0
Urdu Automatic Speech Recognition State of the Art Solution
Automatic Speech Recognition using Facebook's wav2vec2-xls-r-300m model and mozilla-foundation common_voice_8_0 Urdu Dataset.
Model Finetunning
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9889
- Wer: 0.5607
- Cer: 0.2370
Quick Prediction
Install all dependecies using requirment.txt
file and then run bellow command to predict the text:
import torch
from datasets import load_dataset, Audio
from transformers import pipeline
model = "Model"
data = load_dataset("Data", "ur", split="test", delimiter="\t")
def path_adjust(batch):
batch["path"] = "Data/ur/clips/" + str(batch["path"])
return batch
data = data.map(path_adjust)
sample_iter = iter(data.cast_column("path", Audio(sampling_rate=16_000)))
sample = next(sample_iter)
asr = pipeline("automatic-speech-recognition", model=model)
prediction = asr(
sample["path"]["array"], chunk_length_s=5, stride_length_s=1)
prediction
# => {'text': 'اب یہ ونگین لمحاتانکھار دلمیں میںفوث کریلیا اجائ'}
Evaluation Commands
To evaluate on mozilla-foundation/common_voice_8_0
with split test
, you can copy and past the command to the terminal.
python3 eval.py --model_id Model --dataset Data --config ur --split test --chunk_length_s 5.0 --stride_length_s 1.0 --log_outputs
OR Run the simple shell script
bash run_eval.sh
Language Model
Boosting Wav2Vec2 with n-grams in 🤗 Transformers
- Get suitable Urdu text data for a language model
- Build an n-gram with KenLM
- Combine the n-gram with a fine-tuned Wav2Vec2 checkpoint
Install kenlm and pyctcdecode before running the notebook.
pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode
Eval Results
Without LM | With LM |
---|---|
56.21 | 46.37 |
Directory Structure
<root directory>
|
.- README.md
|
.- Data/
|
.- Model/
|
.- Images/
|
.- Sample/
|
.- Gradio/
|
.- Eval Results/
|
.- With LM/
|
.- Without LM/
| ...
.- notebook.ipynb
|
.- run_eval.sh
|
.- eval.py
Gradio App
SOTA
- Add Language Model
- Webapp/API
- [] Denoise Audio
- [] Text Processing
- [] Spelling Mistakes
- Hyperparameters optimization
- [] Training on 300 Epochs & 64 Batch Size
- [] Improved Language Model
- [] Contribute to Urdu ASR Audio Dataset
Robust Speech Recognition Challenge 2022
This project was the results of HuggingFace Robust Speech Recognition Challenge. I was one of the winner with four state of the art ASR model. Check out my SOTA checkpoints.