menon92's picture
Update README.md
1bb676c verified
|
raw
history blame
2.38 kB
metadata
license: cc-by-nc-4.0
language:
  - bn
library_name: nemo
pipeline_tag: automatic-speech-recognition

Hishab BN FastConformer

Hishab BN FastConformer is a fastconformer based model trained on ~18K Hours MegaBNSpeech corpus.

Using method

This model can be used for transcribing Bangla audio and also can be used as pre-trained model to fine-tuning on custom datasets using NeMo framework.

Installation

To install NeMo check NeMo documentation.

Inferencing

Download test_bn_fastconformer.wav

# pip install -q 'nemo_toolkit[asr]'

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/hishab_bn_fastconformer")

auido_file = "test_bn_fastconformer.wav"
transcriptions = asr_model.transcribe([auido_file])
print(transcriptions)
# ['আজ সরকারি ছুটির দিন দেশের সব শিক্ষা প্রতিষ্ঠান সহ সরকারি আধা সরকারি স্বায়ত্তশাসিত প্রতিষ্ঠান ও ভবনে জাতীয় পতাকা অর্ধনমিত ও কালো পতাকা উত্তোলন করা হয়েছে']

Colab Notebook for Infer: Bangla FastConformer Infer.ipynb

Training Datasets

Channels Category Hours
News 17,640.00
Talkshow 688.82
Vlog 0.02
Crime Show 4.08
Total 18,332.92

Training Details

For training the model, the dataset we selected comprises 17.64k hours of news chan- nel content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows.

Evaluation

image/png

image/png

Citation