Update README.md

87cb1dc verified 2 months ago

4.64 kB

	---
	license: cc-by-nc-4.0
	language:
	- bn
	library_name: nemo
	pipeline_tag: automatic-speech-recognition
	tags:
	- ASR
	- Automatic Speech Recognition
	- Bangla ASR
	- Bengali ASR
	- bn asr
	- Bangla fastconformer
	- https://arxiv.org/abs/2311.03196
	---
	## Summary
	__titu_stt_bn_fastconformer__ is a [fastconformer](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#fast-conformer) based model trained on ~18K Hours [MegaBNSpeech]() corpus.

	Details on paper: [https://aclanthology.org/2023.banglalp-1.16/](https://aclanthology.org/2023.banglalp-1.16/)

	## Using method
	This model can be used for transcribing Bangla audio and also can be used as pre-trained model to fine-tuning on custom datasets using [NeMo](https://github.com/NVIDIA/NeMo) framework.

	### Installation
	To install [NeMo](https://github.com/NVIDIA/NeMo) check NeMo documentation.

	```
	pip install -q 'nemo_toolkit[asr]'
	```

	### Inferencing
	[Download test_bn_fastconformer.wav](https://huggingface.co/hishab/hishab_bn_fastconformer/blob/main/test_bn_fastconformer.wav)
	```py
	# pip install -q 'nemo_toolkit[asr]'

	import nemo.collections.asr as nemo_asr
	asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/titu_stt_bn_fastconformer")

	auido_file = "test_bn_fastconformer.wav"
	transcriptions = asr_model.transcribe([auido_file])
	print(transcriptions)
	# ['আজ সরকারি ছুটির দিন দেশের সব শিক্ষা প্রতিষ্ঠান সহ সরকারি আধা সরকারি স্বায়ত্তশাসিত প্রতিষ্ঠান ও ভবনে জাতীয় পতাকা অর্ধনমিত ও কালো পতাকা উত্তোলন করা হয়েছে']
	```
	Colab Notebook for Infer: [Bangla FastConformer Infer.ipynb](https://colab.research.google.com/drive/1J3bxXlLBgSf1zOKVKbRYu1VrbEJFLlUc?usp=sharing)

	## Training Datasets

	\| Channels Category \| Hours \|
	\| ----------------- \| ----------- \|
	\| News \| 17,640.00 \|
	\| Talkshow \| 688.82 \|
	\| Vlog \| 0.02 \|
	\| Crime Show \| 4.08 \|
	\| Total \| 18,332.92 \|


	## Training Details

	For training the model, the dataset we selected comprises 17.64k hours of news chan- nel content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows.

	## Evaluation


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64df9253cccd823564c3303b/WvMlp95z2-GXT6AYfwW8Y.png)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64df9253cccd823564c3303b/O2RA9TAedIv1OTqgdIap5.png)

	## Citation
	```
	@inproceedings{nandi-etal-2023-pseudo,
	title = "Pseudo-Labeling for Domain-Agnostic {B}angla Automatic Speech Recognition",
	author = "Nandi, Rabindra Nath and
	Menon, Mehadi and
	Muntasir, Tareq and
	Sarker, Sagor and
	Muhtaseem, Quazi Sarwar and
	Islam, Md. Tariqul and
	Chowdhury, Shammur and
	Alam, Firoj",
	editor = "Alam, Firoj and
	Kar, Sudipta and
	Chowdhury, Shammur Absar and
	Sadeque, Farig and
	Amin, Ruhul",
	booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
	month = dec,
	year = "2023",
	address = "Singapore",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2023.banglalp-1.16",
	doi = "10.18653/v1/2023.banglalp-1.16",
	pages = "152--162",
	abstract = "One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR",
	}
	```