Results

WER 46

Use with BanglaSpeech2text

Test it in Google Colab

Installation

You can install the library using pip:

pip install banglaspeech2text

Usage

Model Initialization

To use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the "base" model, but you can choose from different pre-trained models: "tiny", "small", "medium", "base", or "large". Here's an example:

from banglaspeech2text import Speech2Text

stt = Speech2Text(model="base")

# You can use it wihout specifying model name (default model is "base")
stt = Speech2Text()

Transcribing Audio Files

You can transcribe an audio file by calling the transcribe method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example:

transcription = stt.transcribe("audio.wav")
print(transcription)

Use with SpeechRecognition

You can use SpeechRecognition package to get audio from microphone and transcribe it. Here's an example:

import speech_recognition as sr
from banglaspeech2text import Speech2Text

stt = Speech2Text(model="base")

r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    output = stt.recognize(audio)

print(output)

Use GPU

You can use GPU for faster inference. Here's an example:


stt = Speech2Text(model="base",use_gpu=True)

Advanced GPU Usage

For more advanced GPU usage you can use device or device_map parameter. Here's an example:

stt = Speech2Text(model="base",device="cuda:0")

stt = Speech2Text(model="base",device_map="auto")

NOTE: Read more about Pytorch Device

Instantly Check with gradio

You can instantly check the model with gradio. Here's an example:

from banglaspeech2text import Speech2Text, available_models
import gradio as gr

stt = Speech2Text(model="base",use_gpu=True)

# You can also open the url and check it in mobile
gr.Interface(
    fn=stt.transcribe, 
    inputs=gr.Audio(source="microphone", type="filepath"), 
    outputs="text").launch(share=True)

Note: For more usecases and models -> BanglaSpeech2Text

Use with transformers

Installation

pip install transformers
pip install torch

Usage

Use with file

from transformers import pipeline

pipe = pipeline('automatic-speech-recognition','shhossain/whisper-base-bn')

def transcribe(audio_path):
  return pipe(audio_path)['text']

audio_file = "test.wav"

print(transcribe(audio_file))

shhossain
/

whisper-base-bn

Results

Use with BanglaSpeech2text

Test it in Google Colab

Installation

Usage

Model Initialization

Transcribing Audio Files

Use with SpeechRecognition

Use GPU

Advanced GPU Usage

Instantly Check with gradio

Use with transformers

Installation

Usage

Use with file

Dataset used to train shhossain/whisper-base-bn

Space using shhossain/whisper-base-bn 1