--- license: apache-2.0 datasets: - mozilla-foundation/common_voice_11_0 language: - en - bn metrics: - wer library_name: transformers pipeline_tag: automatic-speech-recognition --- ## Results - WER 46 # Use with [BanglaSpeech2text](https://github.com/shhossain/BanglaSpeech2Text) ## Test it in Google Colab - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shhossain/BanglaSpeech2Text/blob/main/banglaspeech2text_in_colab.ipynb) ## Installation You can install the library using pip: ```bash pip install banglaspeech2text ``` ## Usage ### Model Initialization To use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the "base" model, but you can choose from different pre-trained models: "tiny", "small", "medium", "base", or "large". Here's an example: ```python from banglaspeech2text import Speech2Text stt = Speech2Text(model="base") # You can use it wihout specifying model name (default model is "base") stt = Speech2Text() ``` ### Transcribing Audio Files You can transcribe an audio file by calling the transcribe method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example: ```python transcription = stt.transcribe("audio.wav") print(transcription) ``` ### Use with SpeechRecognition You can use [SpeechRecognition](https://pypi.org/project/SpeechRecognition/) package to get audio from microphone and transcribe it. Here's an example: ```python import speech_recognition as sr from banglaspeech2text import Speech2Text stt = Speech2Text(model="base") r = sr.Recognizer() with sr.Microphone() as source: print("Say something!") audio = r.listen(source) output = stt.recognize(audio) print(output) ``` ### Use GPU You can use GPU for faster inference. Here's an example: ```python stt = Speech2Text(model="base",use_gpu=True) ``` ### Advanced GPU Usage For more advanced GPU usage you can use `device` or `device_map` parameter. Here's an example: ```python stt = Speech2Text(model="base",device="cuda:0") ``` ```python stt = Speech2Text(model="base",device_map="auto") ``` __NOTE__: Read more about [Pytorch Device](https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.device) ### Instantly Check with gradio You can instantly check the model with gradio. Here's an example: ```python from banglaspeech2text import Speech2Text, available_models import gradio as gr stt = Speech2Text(model="base",use_gpu=True) # You can also open the url and check it in mobile gr.Interface( fn=stt.transcribe, inputs=gr.Audio(source="microphone", type="filepath"), outputs="text").launch(share=True) ``` __Note__: For more usecases and models -> [BanglaSpeech2Text](https://github.com/shhossain/BanglaSpeech2Text) # Use with transformers ### Installation ``` pip install transformers pip install torch ``` ## Usage ### Use with file ```python from transformers import pipeline pipe = pipeline('automatic-speech-recognition','shhossain/whisper-base-bn') def transcribe(audio_path): return pipe(audio_path)['text'] audio_file = "test.wav" print(transcribe(audio_file)) ```