sagorsarker's picture
Update README.md
87cb1dc verified
|
raw
history blame
4.64 kB
metadata
license: cc-by-nc-4.0
language:
  - bn
library_name: nemo
pipeline_tag: automatic-speech-recognition
tags:
  - ASR
  - Automatic Speech Recognition
  - Bangla ASR
  - Bengali ASR
  - bn asr
  - Bangla fastconformer
  - https://arxiv.org/abs/2311.03196

Summary

titu_stt_bn_fastconformer is a fastconformer based model trained on ~18K Hours MegaBNSpeech corpus.

Details on paper: https://aclanthology.org/2023.banglalp-1.16/

Using method

This model can be used for transcribing Bangla audio and also can be used as pre-trained model to fine-tuning on custom datasets using NeMo framework.

Installation

To install NeMo check NeMo documentation.

pip install -q 'nemo_toolkit[asr]'

Inferencing

Download test_bn_fastconformer.wav

# pip install -q 'nemo_toolkit[asr]'

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/titu_stt_bn_fastconformer")

auido_file = "test_bn_fastconformer.wav"
transcriptions = asr_model.transcribe([auido_file])
print(transcriptions)
# ['আজ সরকারি ছুটির দিন দেশের সব শিক্ষা প্রতিষ্ঠান সহ সরকারি আধা সরকারি স্বায়ত্তশাসিত প্রতিষ্ঠান ও ভবনে জাতীয় পতাকা অর্ধনমিত ও কালো পতাকা উত্তোলন করা হয়েছে']

Colab Notebook for Infer: Bangla FastConformer Infer.ipynb

Training Datasets

Channels Category Hours
News 17,640.00
Talkshow 688.82
Vlog 0.02
Crime Show 4.08
Total 18,332.92

Training Details

For training the model, the dataset we selected comprises 17.64k hours of news chan- nel content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows.

Evaluation

image/png

image/png

Citation

@inproceedings{nandi-etal-2023-pseudo,
    title = "Pseudo-Labeling for Domain-Agnostic {B}angla Automatic Speech Recognition",
    author = "Nandi, Rabindra Nath  and
      Menon, Mehadi  and
      Muntasir, Tareq  and
      Sarker, Sagor  and
      Muhtaseem, Quazi Sarwar  and
      Islam, Md. Tariqul  and
      Chowdhury, Shammur  and
      Alam, Firoj",
    editor = "Alam, Firoj  and
      Kar, Sudipta  and
      Chowdhury, Shammur Absar  and
      Sadeque, Farig  and
      Amin, Ruhul",
    booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.banglalp-1.16",
    doi = "10.18653/v1/2023.banglalp-1.16",
    pages = "152--162",
    abstract = "One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR",
}