metadata

license: apache-2.0
datasets:
  - mozilla-foundation/common_voice_11_0
language:
  - en
  - bn
metrics:
  - wer
library_name: transformers
pipeline_tag: automatic-speech-recognition

Results

WER 46

Use with banglaSpeech2text

Installation

pip install banglaspeech2text

Note: Must have git and git lfs installed. For more info visit banglaspeech2text doc here

Usage

Use with file

from banglaspeech2text import Model

base_model = Model('whisper_base_bn_sifat')
base_model.load() # loading the pipline. first time loading will take time as the model is not downloaded yet.

audio_file = "test.wav" # .wav, .mp3, mp4, .ogg, etc.

print(base_model.recognize(audio_file))

Use with SpeechRecognition

import speech_recognition as sr
from banglaspeech2text import Model, available_models

# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model) # load the model
model.load()


r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    output = model.recognize(audio)

print(output) # output will be a direct containing text
print(output['text'])

Note: For more usecases and models -> BanglaSpeech2Text

Use with transformers

Installation

pip install transformers
pip install torch

Usage

Use with file

from transformers import pipeline

pipe = pipeline('automatic-speech-recognition','shhossain/whisper-base-bn')

def transcribe(audio_path):
  return pipe(audio_path)['text']

audio_file = "test.wav"

print(transcribe(audio_file))