File size: 3,211 Bytes
0fadfd9
 
66b5c77
 
 
 
d570704
66b5c77
 
 
 
0fadfd9
d570704
 
66b5c77
5a7d9b4
e29c19d
4ffe2fb
188022f
e29c19d
d570704
4ffe2fb
 
d570704
 
 
 
 
4ffe2fb
 
d570704
 
4ffe2fb
d570704
4ffe2fb
d570704
4ffe2fb
 
 
d570704
4ffe2fb
 
d570704
4ffe2fb
 
 
d570704
4ffe2fb
d570704
4ffe2fb
d570704
 
4ffe2fb
d570704
4ffe2fb
d570704
 
 
 
 
4ffe2fb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d570704
4ffe2fb
 
 
 
 
d570704
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
license: apache-2.0
datasets:
- mozilla-foundation/common_voice_11_0
language:
- en
- bn
metrics:
- wer
library_name: transformers
pipeline_tag: automatic-speech-recognition
---
## Results
- WER 46

# Use with [BanglaSpeech2text](https://github.com/shhossain/BanglaSpeech2Text)

## Test it in Google Colab
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shhossain/BanglaSpeech2Text/blob/main/banglaspeech2text_in_colab.ipynb)

## Installation
You can install the library using pip:

```bash
pip install banglaspeech2text
```

## Usage
### Model Initialization
To use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the "base" model, but you can choose from different pre-trained models: "tiny", "small", "medium", "base", or "large". Here's an example:

```python
from banglaspeech2text import Speech2Text

stt = Speech2Text(model="base")

# You can use it wihout specifying model name (default model is "base")
stt = Speech2Text()
```

### Transcribing Audio Files
You can transcribe an audio file by calling the transcribe method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example:

```python
transcription = stt.transcribe("audio.wav")
print(transcription)
```

### Use with SpeechRecognition
You can use [SpeechRecognition](https://pypi.org/project/SpeechRecognition/) package to get audio from microphone and transcribe it. Here's an example:
```python
import speech_recognition as sr
from banglaspeech2text import Speech2Text

stt = Speech2Text(model="base")

r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    output = stt.recognize(audio)

print(output)
```

### Use GPU
You can use GPU for faster inference. Here's an example:
```python

stt = Speech2Text(model="base",use_gpu=True)

```
### Advanced GPU Usage
For more advanced GPU usage you can use `device` or `device_map` parameter. Here's an example:
```python
stt = Speech2Text(model="base",device="cuda:0")
```
```python
stt = Speech2Text(model="base",device_map="auto")
```
__NOTE__: Read more about [Pytorch Device](https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.device)

### Instantly Check with gradio
You can instantly check the model with gradio. Here's an example:
```python
from banglaspeech2text import Speech2Text, available_models
import gradio as gr

stt = Speech2Text(model="base",use_gpu=True)

# You can also open the url and check it in mobile
gr.Interface(
    fn=stt.transcribe, 
    inputs=gr.Audio(source="microphone", type="filepath"), 
    outputs="text").launch(share=True)
```

__Note__: For more usecases and models -> [BanglaSpeech2Text](https://github.com/shhossain/BanglaSpeech2Text)

# Use with transformers
### Installation
```
pip install transformers
pip install torch
```

## Usage

### Use with file
```python
from transformers import pipeline

pipe = pipeline('automatic-speech-recognition','shhossain/whisper-base-bn')

def transcribe(audio_path):
  return pipe(audio_path)['text']

audio_file = "test.wav"

print(transcribe(audio_file))
```