shhossain
/

whisper-base-bn

@@ -13,52 +13,86 @@ pipeline_tag: automatic-speech-recognition
 ## Results
 - WER 46
-# Use with banglaSpeech2text
-## Test in google colab
-- [NoteBook](https://colab.research.google.com/drive/1rj4Jme6qrc8tRaPY3MTuuUc6MEr8We9N?usp=sharing)
 ## Installation
 ```bash
 pip install banglaspeech2text
 ```
-__Note__: Must have git and git lfs installed. For more info visit banglaspeech2text doc [here](https://github.com/shhossain/BanglaSpeech2Text#download-git)
 ## Usage
-### Use with file
 ```python
-from banglaspeech2text import Model
-base_model = Model('whisper_base_bn_sifat')
-base_model.load() # loading the pipline. first time loading will take time as the model is not downloaded yet.
-audio_file = "test.wav" # .wav, .mp3, mp4, .ogg, etc.
-print(base_model.recognize(audio_file))
 ```
 ### Use with SpeechRecognition
 ```python
 import speech_recognition as sr
-from banglaspeech2text import Model, available_models
-# Load a model
-models = available_models()
-model = models[0] # select a model
-model = Model(model) # load the model
-model.load()
 r = sr.Recognizer()
 with sr.Microphone() as source:
     print("Say something!")
     audio = r.listen(source)
-    output = model.recognize(audio)
-print(output) # output will be a direct containing text
-print(output['text'])
 ```
 __Note__: For more usecases and models -> [BanglaSpeech2Text](https://github.com/shhossain/BanglaSpeech2Text)

 ## Results
 - WER 46
+# Use with BanglaSpeech2text
+## Test it in Google Colab
+- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shhossain/BanglaSpeech2Text/blob/main/BanglaSpeech2Text_in_Colab.ipynb)
 ## Installation
+You can install the library using pip:
 ```bash
 pip install banglaspeech2text
 ```
 ## Usage
+### Model Initialization
+To use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the "base" model, but you can choose from different pre-trained models: "tiny", "small", "medium", "base", or "large". Here's an example:
 ```python
+from banglaspeech2text import Speech2Text
+stt = Speech2Text(model="base")
+# You can use it wihout specifying model name (default model is "base")
+stt = Speech2Text()
+```
+### Transcribing Audio Files
+You can transcribe an audio file by calling the transcribe method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example:
+```python
+transcription = stt.transcribe("audio.wav")
+print(transcription)
 ```
 ### Use with SpeechRecognition
+You can use [SpeechRecognition](https://pypi.org/project/SpeechRecognition/) package to get audio from microphone and transcribe it. Here's an example:
 ```python
 import speech_recognition as sr
+from banglaspeech2text import Speech2Text
+stt = Speech2Text(model="base")
 r = sr.Recognizer()
 with sr.Microphone() as source:
     print("Say something!")
     audio = r.listen(source)
+    output = stt.recognize(audio)
+print(output)
+```
+### Use GPU
+You can use GPU for faster inference. Here's an example:
+```python
+stt = Speech2Text(model="base",use_gpu=True)
+```
+### Advanced GPU Usage
+For more advanced GPU usage you can use `device` or `device_map` parameter. Here's an example:
+```python
+stt = Speech2Text(model="base",device="cuda:0")
+```
+```python
+stt = Speech2Text(model="base",device_map="auto")
+```
+__NOTE__: Read more about [Pytorch Device](https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.device)
+### Instantly Check with gradio
+You can instantly check the model with gradio. Here's an example:
+```python
+from banglaspeech2text import Speech2Text, available_models
+import gradio as gr
+stt = Speech2Text(model="base",use_gpu=True)
+# You can also open the url and check it in mobile
+gr.Interface(
+    fn=stt.transcribe,
+    inputs=gr.Audio(source="microphone", type="filepath"),
+    outputs="text").launch(share=True)
 ```
 __Note__: For more usecases and models -> [BanglaSpeech2Text](https://github.com/shhossain/BanglaSpeech2Text)