poonehmousavi commited on
Commit
06f25b2
1 Parent(s): 04a099d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  language:
3
- - de
4
  thumbnail: null
5
  pipeline_tag: automatic-speech-recognition
6
  tags:
@@ -10,36 +10,36 @@ tags:
10
  - Transformer
11
  license: apache-2.0
12
  datasets:
13
- - commonvoice
14
  metrics:
15
  - wer
16
  - cer
17
  model-index:
18
- - name: asr-wav2vec2-commonvoice-de
19
  results:
20
  - task:
21
  name: Automatic Speech Recognition
22
  type: automatic-speech-recognition
23
  dataset:
24
- name: CommonVoice Corpus 10.0/ (German)
25
- type: mozilla-foundation/common_voice_10_1
26
- config: de
27
  split: test
28
  args:
29
- language: de
30
  metrics:
31
  - name: Test WER
32
  type: wer
33
- value: '9.54'
34
  ---
35
 
36
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
37
  <br/><br/>
38
 
39
- # wav2vec 2.0 with CTC trained on CommonVoice German (No LM)
40
 
41
  This repository provides all the necessary tools to perform automatic speech
42
- recognition from an end-to-end system pretrained on CommonVoice (German Language) within
43
  SpeechBrain. For a better experience, we encourage you to learn more about
44
  [SpeechBrain](https://speechbrain.github.io).
45
 
@@ -47,14 +47,14 @@ The performance of the model is the following:
47
 
48
  | Release | Test CER | Test WER | GPUs |
49
  |:-------------:|:--------------:|:--------------:| :--------:|
50
- | 16-08-22 | 2.40 | 9.54 | 1xRTXA6000 48GB |
51
 
52
  ## Pipeline description
53
 
54
  This ASR system is composed of 2 different but linked blocks:
55
- - Tokenizer (char) that transforms words into chars and trained with
56
- the train transcriptions (train.tsv) of CommonVoice (DE).
57
- - Acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([wav2vec2-large-xlsr-53-german](https://huggingface.co/facebook/wav2vec2-large-xlsr-53-german)) is combined with two DNN layers and finetuned on CommonVoice DE.
58
  The obtained final acoustic representation is given to the CTC decoder.
59
 
60
  The system is trained with recordings sampled at 16kHz (single channel).
@@ -71,13 +71,13 @@ pip install speechbrain transformers
71
  Please notice that we encourage you to read our tutorials and learn more about
72
  [SpeechBrain](https://speechbrain.github.io).
73
 
74
- ### Transcribing your own audio files (in German)
75
 
76
  ```python
77
  from speechbrain.pretrained import EncoderASR
78
 
79
- asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-commonvoice-de", savedir="pretrained_models/asr-wav2vec2-commonvoice-de")
80
- asr_model.transcribe_file("speechbrain/asr-wav2vec2-commonvoice-de/example-de.wav")
81
 
82
  ```
83
  ### Inference on GPU
 
1
  ---
2
  language:
3
+ - en
4
  thumbnail: null
5
  pipeline_tag: automatic-speech-recognition
6
  tags:
 
10
  - Transformer
11
  license: apache-2.0
12
  datasets:
13
+ - commonvoice.14.0
14
  metrics:
15
  - wer
16
  - cer
17
  model-index:
18
+ - name: asr-wav2vec2-commonvoice-14-en
19
  results:
20
  - task:
21
  name: Automatic Speech Recognition
22
  type: automatic-speech-recognition
23
  dataset:
24
+ name: CommonVoice Corpus 14.0/ (English)
25
+ type: mozilla-foundation/common_voice_14.0
26
+ config: en
27
  split: test
28
  args:
29
+ language: en
30
  metrics:
31
  - name: Test WER
32
  type: wer
33
+ value: '16.68'
34
  ---
35
 
36
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
37
  <br/><br/>
38
 
39
+ # wav2vec 2.0 with CTC trained on CommonVoice English (No LM)
40
 
41
  This repository provides all the necessary tools to perform automatic speech
42
+ recognition from an end-to-end system pretrained on CommonVoice (English Language) within
43
  SpeechBrain. For a better experience, we encourage you to learn more about
44
  [SpeechBrain](https://speechbrain.github.io).
45
 
 
47
 
48
  | Release | Test CER | Test WER | GPUs |
49
  |:-------------:|:--------------:|:--------------:| :--------:|
50
+ | 15-08-23 | 7.92 | 16.86 | 1xV100 32GB |
51
 
52
  ## Pipeline description
53
 
54
  This ASR system is composed of 2 different but linked blocks:
55
+ - Tokenizer (unigram) that transforms words into unigrams and trained with
56
+ the train transcriptions (train.tsv) of CommonVoice (en).
57
+ - Acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([wav2vec2-large-lv60](https://huggingface.co/facebook/wav2vec2-large-lv60)) is combined with two DNN layers and finetuned on CommonVoice DE.
58
  The obtained final acoustic representation is given to the CTC decoder.
59
 
60
  The system is trained with recordings sampled at 16kHz (single channel).
 
71
  Please notice that we encourage you to read our tutorials and learn more about
72
  [SpeechBrain](https://speechbrain.github.io).
73
 
74
+ ### Transcribing your own audio files (in English)
75
 
76
  ```python
77
  from speechbrain.pretrained import EncoderASR
78
 
79
+ asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-commonvoice-14-en", savedir="pretrained_models/asr-wav2vec2-commonvoice-14-en")
80
+ asr_model.transcribe_file("speechbrain/asr-wav2vec2-commonvoice-14-en/example-en.wav")
81
 
82
  ```
83
  ### Inference on GPU