sangeet2020 commited on
Commit
c094471
1 Parent(s): f9ddca8

add ecnoder and readme

Browse files
Files changed (2) hide show
  1. README.md +146 -0
  2. encoder.ckpt +0 -0
README.md CHANGED
@@ -1,3 +1,149 @@
1
  ---
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - de
4
+ thumbnail: null
5
+ pipeline_tag: automatic-speech-recognition
6
+ tags:
7
+ - whisper
8
+ - pytorch
9
+ - speechbrain
10
+ - Transformer
11
  license: apache-2.0
12
+ datasets:
13
+ - RescueSpeech
14
+ metrics:
15
+ - wer
16
+ - sisnri
17
+ - sdri
18
+ - pesq
19
+ - stoi
20
+ model-index:
21
+ - name: noisy-whisper-resucespeech
22
+ results:
23
+ - task:
24
+ name: Noise Robust Automatic Speech Recognition
25
+ type: noise-robust-automatic-speech-recognition
26
+ dataset:
27
+ name: RescueSpeech
28
+ type: zenodo.org/record/8077622
29
+ config: de
30
+ split: test
31
+ args:
32
+ language: de
33
+ metrics:
34
+ - name: Test WER
35
+ type: wer
36
+ value: '24.20'
37
+ - name: Test PESQ
38
+ type: pesq
39
+ value: '2.085'
40
+ - name: Test SI-SNRi
41
+ type: si-snri
42
+ value: '7.334'
43
+ - name: Test SI-SDRi
44
+ type: si-sdri
45
+ value: '7.871'
46
  ---
47
+
48
+ # Noise robust speech recognition on jointly trained SepFormer speech enhancement and Whisper ASR using RescueSpeech data.
49
+
50
+ This repository provides all the necessary tools to perform noise automatic speech
51
+ recognition on a simple combination of an enhancement model (**SepFormer**) and speech recognizer (**Whisper**).
52
+ Initially, the models are fine-tuned individually on the RescueSpeech dataset, and then they are integrated to undergo joint training, enabling them to effectively handle noise interference. For a better experience, we encourage you to learn more about
53
+ [SpeechBrain](https://speechbrain.github.io).
54
+
55
+ The performance of the model is the following:
56
+
57
+ | Release | SISNRi | SDRi | PESQ | STOI | WER | GPUs |
58
+ |:-------------:|:--------------:|:--------------:| :--------:|:--------------:| :--------:|:--------:|
59
+ | 07-11-23 | 7.334 | 7.871 | 2.085 | 0.857 | 24.20 | 1xA100 80 GB |
60
+
61
+ ## Pipeline description
62
+ - The enhancement system is composed of SepFormer model.
63
+ - The model is first trained on Microsoft-DNS dataset and subsequently fine-tuned on RescueSpeech dataset.
64
+ - The enhanced utterances are fed to the ASR model.
65
+ - And the ASR system is composed of whisper encoder-decoder blocks:
66
+ - The pretrained whisper-large-v2 encoder is frozen.
67
+ - The pretrained Whisper tokenizer is used.
68
+ - A pretrained Whisper-large-v2 decoder ([openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)) is finetuned on RescueSpeech dataset.
69
+ The obtained final acoustic representation is given to the greedy decoder.
70
+
71
+ The system is trained with recordings sampled at 16kHz (single channel).
72
+ The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
73
+
74
+ ## Install SpeechBrain
75
+
76
+ First of all, please install tranformers and SpeechBrain with the following command:
77
+
78
+ ```
79
+ pip install speechbrain transformers==4.28.0
80
+ ```
81
+
82
+ Please notice that we encourage you to read our tutorials and learn more about
83
+ [SpeechBrain](https://speechbrain.github.io).
84
+
85
+ ### Transcribing your own audio files (in German)
86
+
87
+ ```python
88
+
89
+ from speechbrain.pretrained import WhisperASR
90
+
91
+ asr_model = WhisperASR.from_hparams(source="speechbrain/rescuespeech_whisper", savedir="pretrained_models/rescuespeech_whisper")
92
+ asr_model.transcribe_file("speechbrain/rescuespeech_whisper/example_de.wav")
93
+
94
+
95
+ ```
96
+ ### Inference on GPU
97
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
98
+
99
+
100
+ You can find our training results (models, logs, etc) [here](https://www.dropbox.com/sh/7tryj6n7cfy0poe/AADpl4b8rGRSnoQ5j6LCj9tua?dl=0).
101
+
102
+ ### Limitations
103
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
104
+
105
+ #### Referencing SpeechBrain
106
+
107
+ ```
108
+ @misc{SB2021,
109
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
110
+ title = {SpeechBrain},
111
+ year = {2021},
112
+ publisher = {GitHub},
113
+ journal = {GitHub repository},
114
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
115
+ }
116
+ ```
117
+
118
+ ### Referencing RescueSpeech
119
+ ```bibtex
120
+ @misc{sagar2023rescuespeech,
121
+ title={RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain},
122
+ author={Sangeet Sagar and Mirco Ravanelli and Bernd Kiefer and Ivana Kruijff Korbayova and Josef van Genabith},
123
+ year={2023},
124
+ eprint={2306.04054},
125
+ archivePrefix={arXiv},
126
+ primaryClass={eess.AS}
127
+ }
128
+ ```
129
+
130
+ #### About SpeechBrain
131
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
132
+
133
+ Website: https://speechbrain.github.io/
134
+
135
+ GitHub: https://github.com/speechbrain/speechbrain
136
+
137
+
138
+
139
+ ```bash
140
+ from speechbrain.pretrained import SepformerSeparation as Separator
141
+ from speechbrain.pretrained import WhisperASR
142
+
143
+ enh_model = Separator.from_hparams(source="CKPT+2023-06-24+21-49-17+00", savedir='pretrained_models/sepformer_rescuespeech', hparams_file='hyperparams_asr.yaml')
144
+ asr_model = WhisperASR.from_hparams(source="CKPT+2023-06-24+21-49-17+00", savedir="pretrained_models/whisper_rescuespeech", hparams_file='hyperparams_asr.yaml')
145
+
146
+ # For custom file, change the path accordingly
147
+ est_sources = enh_model.separate_file(path='example_rescuespeech16k.wav')
148
+ print(asr_model(est_sources[:, :, 0]))
149
+ ```
encoder.ckpt ADDED
Binary file (17.3 kB). View file