metadata
license: apache-2.0
datasets:
- mozilla-foundation/common_voice_11_0
language:
- en
- bn
metrics:
- wer
library_name: transformers
pipeline_tag: automatic-speech-recognition
Results
- WER 46
Use with banglaSpeech2text
Installation
pip install banglaspeech2text
Note: Must have git and git lfs installed. For more info visit banglaspeech2text doc here
Usage
Use with file
from banglaspeech2text import Model
base_model = Model('whisper_base_bn_sifat')
base_model.load() # loading the pipline. first time loading will take time as the model is not downloaded yet.
audio_file = "test.wav" # .wav, .mp3, mp4, .ogg, etc.
print(base_model.recognize(audio_file))
Use with SpeechRecognition
import speech_recognition as sr
from banglaspeech2text import Model, available_models
# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model) # load the model
model.load()
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
output = model.recognize(audio)
print(output) # output will be a direct containing text
print(output['text'])
Note: For more usecases and models -> BanglaSpeech2Text
Use with transformers
Installation
pip install transformers
pip install torch
Usage
Use with file
from transformers import pipeline
pipe = pipeline('automatic-speech-recognition','shhossain/whisper-base-bn')
def transcribe(audio_path):
return pipe(audio_path)['text']
audio_file = "test.wav"
print(transcribe(audio_file))