poonehmousavi commited on
Commit
09c1198
1 Parent(s): 052101d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -18
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  language:
3
- - hi
4
  thumbnail: null
5
  pipeline_tag: automatic-speech-recognition
6
  tags:
@@ -16,31 +16,31 @@ metrics:
16
  - wer
17
  - cer
18
  model-index:
19
- - name: asr-whisper-large-v2-commonvoice-ar
20
  results:
21
  - task:
22
  name: Automatic Speech Recognition
23
  type: automatic-speech-recognition
24
  dataset:
25
- name: CommonVoice 10.0 (Hindi)
26
- type: mozilla-foundation/common_voice_10_0
27
- config: hi
28
  split: test
29
  args:
30
- language: hi
31
  metrics:
32
  - name: Test WER
33
  type: wer
34
- value: '15.27'
35
  ---
36
 
37
- <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
38
  <br/><br/>
39
 
40
- # whisper large-v2 fine-tuned on CommonVoice Hindi
41
 
42
  This repository provides all the necessary tools to perform automatic speech
43
- recognition from an end-to-end whisper model fine-tuned on CommonVoice (Hindi Language) within
44
  SpeechBrain. For a better experience, we encourage you to learn more about
45
  [SpeechBrain](https://speechbrain.github.io).
46
 
@@ -48,14 +48,14 @@ The performance of the model is the following:
48
 
49
  | Release | Test CER | Test WER | GPUs |
50
  |:-------------:|:--------------:|:--------------:| :--------:|
51
- | 01-02-23 | 7.00 | 15.27 | 1xV100 16GB |
52
 
53
  ## Pipeline description
54
 
55
  This ASR system is composed of whisper encoder-decoder blocks:
56
- - The pretrained whisper-large-v2 encoder is frozen.
57
  - The pretrained Whisper tokenizer is used.
58
- - A pretrained Whisper-large-v2 decoder ([openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)) is finetuned on CommonVoice hi.
59
  The obtained final acoustic representation is given to the greedy decoder.
60
 
61
  The system is trained with recordings sampled at 16kHz (single channel).
@@ -66,20 +66,20 @@ The code will automatically normalize your audio (i.e., resampling + mono channe
66
  First of all, please install tranformers and SpeechBrain with the following command:
67
 
68
  ```
69
- pip install speechbrain transformers==4.28.0
70
  ```
71
 
72
  Please notice that we encourage you to read our tutorials and learn more about
73
  [SpeechBrain](https://speechbrain.github.io).
74
 
75
- ### Transcribing your own audio files (in Hindi)
76
 
77
  ```python
78
 
79
  from speechbrain.pretrained import WhisperASR
80
 
81
- asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-large-v2-commonvoice-hi", savedir="pretrained_models/asr-whisper-large-v2-commonvoice-hi")
82
- asr_model.transcribe_file("speechbrain/asr-whisper-large-v2-commonvoice-hi/example-hi.mp3")
83
 
84
 
85
  ```
@@ -103,7 +103,7 @@ pip install -e .
103
  3. Run Training:
104
  ```bash
105
  cd recipes/CommonVoice/ASR/transformer/
106
- python train_with_whisper.py hparams/train_hi_hf_whisper.yaml --data_folder=your_data_folder
107
  ```
108
 
109
  You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/11PKCsyIE703mmDv6n6n_UnD0bUgMPbg_?usp=share_link).
 
1
  ---
2
  language:
3
+ - ar
4
  thumbnail: null
5
  pipeline_tag: automatic-speech-recognition
6
  tags:
 
16
  - wer
17
  - cer
18
  model-index:
19
+ - name: asr-whisper-medium-commonvoice-ar
20
  results:
21
  - task:
22
  name: Automatic Speech Recognition
23
  type: automatic-speech-recognition
24
  dataset:
25
+ name: CommonVoice 10.0 (Arabic)
26
+ type: mozilla-foundation/common_voice_14_0
27
+ config: ar
28
  split: test
29
  args:
30
+ language: ar
31
  metrics:
32
  - name: Test WER
33
  type: wer
34
+ value: '14.82'
35
  ---
36
 
37
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=medium" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
38
  <br/><br/>
39
 
40
+ # whisper medium fine-tuned on CommonVoice-14.0 Arabic
41
 
42
  This repository provides all the necessary tools to perform automatic speech
43
+ recognition from an end-to-end whisper model fine-tuned on CommonVoice (Arabic Language) within
44
  SpeechBrain. For a better experience, we encourage you to learn more about
45
  [SpeechBrain](https://speechbrain.github.io).
46
 
 
48
 
49
  | Release | Test CER | Test WER | GPUs |
50
  |:-------------:|:--------------:|:--------------:| :--------:|
51
+ | 1-08-23 | 4.95 | 14.82 | 1xV100 32GB |
52
 
53
  ## Pipeline description
54
 
55
  This ASR system is composed of whisper encoder-decoder blocks:
56
+ - The pretrained whisper-medium encoder is frozen.
57
  - The pretrained Whisper tokenizer is used.
58
+ - A pretrained Whisper-medium decoder ([openai/whisper-medium](https://huggingface.co/openai/whisper-medium)) is finetuned on CommonVoice ar.
59
  The obtained final acoustic representation is given to the greedy decoder.
60
 
61
  The system is trained with recordings sampled at 16kHz (single channel).
 
66
  First of all, please install tranformers and SpeechBrain with the following command:
67
 
68
  ```
69
+ pip install speechbrain transformers
70
  ```
71
 
72
  Please notice that we encourage you to read our tutorials and learn more about
73
  [SpeechBrain](https://speechbrain.github.io).
74
 
75
+ ### Transcribing your own audio files (in Arabic)
76
 
77
  ```python
78
 
79
  from speechbrain.pretrained import WhisperASR
80
 
81
+ asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-medium-commonvoice-ar", savedir="pretrained_models/asr-whisper-medium-commonvoice-ar")
82
+ asr_model.transcribe_file("speechbrain/asr-whisper-lmedium-commonvoice-ar/example-ar.mp3")
83
 
84
 
85
  ```
 
103
  3. Run Training:
104
  ```bash
105
  cd recipes/CommonVoice/ASR/transformer/
106
+ python train_with_whisper.py hparams/train_ar_hf_whisper.yaml --data_folder=your_data_folder
107
  ```
108
 
109
  You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/11PKCsyIE703mmDv6n6n_UnD0bUgMPbg_?usp=share_link).