poonehmousavi commited on
Commit
5865280
1 Parent(s): 5925f2c

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +132 -0
  2. config.json +3 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - mn
4
+ thumbnail: null
5
+ pipeline_tag: automatic-speech-recognition
6
+ tags:
7
+ - whisper
8
+ - pytorch
9
+ - speechbrain
10
+ - Transformer
11
+ - hf-asr-leaderboard
12
+ license: apache-2.0
13
+ datasets:
14
+ - commonvoice
15
+ metrics:
16
+ - wer
17
+ - cer
18
+ model-index:
19
+ - name: asr-whisper-large-v2-commonvoice-mn
20
+ results:
21
+ - task:
22
+ name: Automatic Speech Recognition
23
+ type: automatic-speech-recognition
24
+ dataset:
25
+ name: CommonVoice 10.0 (Mongolian)
26
+ type: mozilla-foundation/common_voice_10_0
27
+ config: mn
28
+ split: test
29
+ args:
30
+ language: mn
31
+ metrics:
32
+ - name: Test WER
33
+ type: wer
34
+ value: '64.92'
35
+ ---
36
+
37
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
38
+ <br/><br/>
39
+
40
+ # whisper large-v2 fine-tuned on CommonVoice Mongolian
41
+
42
+ This repository provides all the necessary tools to perform automatic speech
43
+ recognition from an end-to-end whisper model fine-tuned on CommonVoice (Mongolian Language) within
44
+ SpeechBrain. For a better experience, we encourage you to learn more about
45
+ [SpeechBrain](https://speechbrain.github.io).
46
+
47
+ The performance of the model is the following:
48
+
49
+ | Release | Test CER | Test WER | GPUs |
50
+ |:-------------:|:--------------:|:--------------:| :--------:|
51
+ | 01-02-23 | 25.73 | 64.92 | 1xV100 16GB |
52
+
53
+ ## Pipeline description
54
+
55
+ This ASR system is composed of whisper encoder-decoder blocks:
56
+ - The pretrained whisper-large-v2 encoder is frozen.
57
+ - The pretrained Whisper tokenizer is used.
58
+ - A pretrained Whisper-large-v2 decoder ([openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)) is finetuned on CommonVoice MN.
59
+ The obtained final acoustic representation is given to the greedy decoder.
60
+
61
+ The system is trained with recordings sampled at 16kHz (single channel).
62
+ The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
63
+
64
+ ## Install SpeechBrain
65
+
66
+ First of all, please install tranformers and SpeechBrain with the following command:
67
+
68
+ ```
69
+ pip install speechbrain transformers
70
+ ```
71
+
72
+ Please notice that we encourage you to read our tutorials and learn more about
73
+ [SpeechBrain](https://speechbrain.github.io).
74
+
75
+ ### Transcribing your own audio files (in Mongolian)
76
+
77
+ ```python
78
+
79
+ from speechbrain.pretrained import WhisperASR
80
+
81
+ asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-large-v2-commonvoice-mn", savedir="retrained_models/asr-whisper-large-v2-commonvoice-mn")
82
+ asr_model.transcribe_file("speechbrain/asr-whisper-large-v2-commonvoice-mn/example-mn.mp3")
83
+
84
+
85
+ ```
86
+ ### Inference on GPU
87
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
88
+
89
+ ### Training
90
+ The model was trained with SpeechBrain.
91
+ To train it from scratch follow these steps:
92
+ 1. Clone SpeechBrain:
93
+ ```bash
94
+ git clone https://github.com/speechbrain/speechbrain/
95
+ ```
96
+ 2. Install it:
97
+ ```bash
98
+ cd speechbrain
99
+ pip install -r requirements.txt
100
+ pip install -e .
101
+ ```
102
+
103
+ 3. Run Training:
104
+ ```bash
105
+ cd recipes/CommonVoice/ASR/transformer/
106
+ python train_with_whisper.py hparams/train_mn_hf_whisper.yaml --data_folder=your_data_folder
107
+ ```
108
+
109
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/10E2xclgNx_6BFxNmv9i1HorBNnsMveP_?usp=share_link).
110
+
111
+ ### Limitations
112
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
113
+
114
+ #### Referencing SpeechBrain
115
+
116
+ ```
117
+ @misc{SB2021,
118
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
119
+ title = {SpeechBrain},
120
+ year = {2021},
121
+ publisher = {GitHub},
122
+ journal = {GitHub repository},
123
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
124
+ }
125
+ ```
126
+
127
+ #### About SpeechBrain
128
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
129
+
130
+ Website: https://speechbrain.github.io/
131
+
132
+ GitHub: https://github.com/speechbrain/speechbrain
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "speechbrain_interface": "WhisperASR"
3
+ }