Spaces:

kingabzpro
/

Urdu-ASR-SOTA

Sleeping

App Files Files Community

Urdu-ASR-SOTA / README.md

kingabzpro

Update README.md

2665c6b over 1 year ago

preview code

raw

history blame

4.11 kB

	---
	title: Urdu ASR SOTA
	emoji: 👨‍🎤
	colorFrom: green
	colorTo: blue
	sdk: gradio
	app_file: Gradio/app.py
	pinned: true
	license: apache-2.0
	---

	# Urdu Automatic Speech Recognition State of the Art Solution

	![cover](Images/cover.jpg)
	Automatic Speech Recognition using Facebook's wav2vec2-xls-r-300m model and mozilla-foundation common_voice_8_0 Urdu Dataset.

	## Model Finetunning

	This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the [common_voice dataset](https://commonvoice.mozilla.org/en/datasets).

	It achieves the following results on the evaluation set:

	- Loss: 0.9889
	- Wer: 0.5607
	- Cer: 0.2370

	## Quick Prediction

	Install all dependecies using `requirment.txt` file and then run bellow command to predict the text:

	```python
	import torch
	from datasets import load_dataset, Audio
	from transformers import pipeline
	model = "Model"
	data = load_dataset("Data", "ur", split="test", delimiter="\t")
	def path_adjust(batch):
	batch["path"] = "Data/ur/clips/" + str(batch["path"])
	return batch
	data = data.map(path_adjust)
	sample_iter = iter(data.cast_column("path", Audio(sampling_rate=16_000)))
	sample = next(sample_iter)

	asr = pipeline("automatic-speech-recognition", model=model)
	prediction = asr(
	sample["path"]["array"], chunk_length_s=5, stride_length_s=1)
	prediction
	# => {'text': 'اب یہ ونگین لمحاتانکھار دلمیں میںفوث کریلیا اجائ'}
	```

	## Evaluation Commands

	To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`, you can copy and past the command to the terminal.

	```bash
	python3 eval.py --model_id Model --dataset Data --config ur --split test --chunk_length_s 5.0 --stride_length_s 1.0 --log_outputs
	```

	OR
	Run the simple shell script

	```bash
	bash run_eval.sh
	```

	## Language Model

	[Boosting Wav2Vec2 with n-grams in 🤗 Transformers](https://huggingface.co/blog/wav2vec2-with-ngram)

	- Get suitable Urdu text data for a language model
	- Build an n-gram with KenLM
	- Combine the n-gram with a fine-tuned Wav2Vec2 checkpoint

	Install kenlm and pyctcdecode before running the notebook.

	```bash
	pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode
	```

	## Eval Results

	\| Without LM \| With LM \|
	\| ---------- \| ------- \|
	\| 56.21 \| 46.37 \|

	## Directory Structure

	```
	<root directory>
	\|
	.- README.md
	\|
	.- Data/
	\|
	.- Model/
	\|
	.- Images/
	\|
	.- Sample/
	\|
	.- Gradio/
	\|
	.- Eval Results/
	\|
	.- With LM/
	\|
	.- Without LM/
	\| ...
	.- notebook.ipynb
	\|
	.- run_eval.sh
	\|
	.- eval.py

	```

	## Gradio App

	## SOTA

	- [x] Add Language Model
	- [x] Webapp/API
	- [] Denoise Audio
	- [] Text Processing
	- [] Spelling Mistakes
	- [x] Hyperparameters optimization
	- [] Training on 300 Epochs & 64 Batch Size
	- [] Improved Language Model
	- [] Contribute to Urdu ASR Audio Dataset

	## Robust Speech Recognition Challenge 2022

	This project was the results of HuggingFace [Robust Speech Recognition Challenge](https://discuss.huggingface.co/t/open-to-the-community-robust-speech-recognition-challenge/13614). I was one of the winner with four state of the art ASR model. Check out my SOTA checkpoints.

	- [Urdu](https://huggingface.co/kingabzpro/wav2vec2-large-xls-r-300m-Urdu)
	- [Arabic](https://huggingface.co/kingabzpro/wav2vec2-large-xlsr-300-arabic)
	- [Punjabi](https://huggingface.co/kingabzpro/wav2vec2-large-xlsr-53-punjabi)
	- [Irish](https://huggingface.co/kingabzpro/wav2vec2-large-xls-r-1b-Irish)

	![winner](Images/winner.png)

	## References

	- [Common Voice Dataset](https://commonvoice.mozilla.org/en/datasets)
	- [Sequence Modeling With CTC](https://distill.pub/2017/ctc/)
	- [Fine-tuning XLS-R for Multi-Lingual ASR with 🤗 Transformers](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2)
	- [Boosting Wav2Vec2 with n-grams in 🤗 Transformers](https://huggingface.co/blog/wav2vec2-with-ngram)
	- [HF Model](https://huggingface.co/kingabzpro/wav2vec2-large-xls-r-300m-Urdu)