Andreas Nautsch commited on
Commit
8a0c123
1 Parent(s): 637a618

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -14,71 +14,102 @@ metrics:
14
  - wer
15
  - cer
16
  ---
 
17
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
18
  <br/><br/>
 
19
  # CRDNN with CTC/Attention trained on CommonVoice 7.0 German (No LM)
20
  This repository provides all the necessary tools to perform automatic speech
21
  recognition from an end-to-end system pretrained on CommonVoice (German Language) within
22
  SpeechBrain. For a better experience, we encourage you to learn more about
23
  [SpeechBrain](https://speechbrain.github.io).
24
  The performance of the model is the following:
 
25
  | Release | Test CER | Test WER | GPUs |
26
  |:-------------:|:--------------:|:--------------:| :--------:|
27
- | 28-10-21 | 4.93 | 15.37 | 1xV100 16GB |
 
28
  ## Credits
29
  The model is provided by [vitas.ai](vitas.ai).
 
30
  ## Pipeline description
31
  This ASR system is composed of 2 different but linked blocks:
 
32
  - Tokenizer (unigram) that transforms words into subword units and trained with
33
  the train transcriptions (train.tsv) of CommonVoice (DE).
34
  - Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
35
  N blocks of convolutional neural networks with normalization and pooling on the
36
  frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
37
  the final acoustic representation that is given to the CTC and attention decoders.
 
38
  ## Install SpeechBrain
39
  First of all, please install SpeechBrain with the following command:
 
40
  ```
41
  pip install speechbrain
42
  ```
 
43
  Please notice that we encourage you to read our tutorials and learn more about
44
  [SpeechBrain](https://speechbrain.github.io).
 
45
  ### Transcribing your own audio files (in German)
 
46
  ```python
47
  from speechbrain.pretrained import EncoderDecoderASR
48
  asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-crdnn-commonvoice-de", savedir="pretrained_models/asr-crdnn-commonvoice-de")
49
  asr_model.transcribe_file("speechbrain/asr-crdnn-commonvoice-de/example-de.wav")
50
  ```
 
51
  ### Inference on GPU
 
52
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
 
53
  ## Parallel Inference on a Batch
 
54
  Please, [see this Colab notebook](https://colab.research.google.com/drive/1hX5ZI9S4jHIjahFCZnhwwQmFoGAi3tmu?usp=sharing) to figure out how to transcribe in parallel a batch of input sentences using a pre-trained model.
 
55
  ### Training
 
56
  The model was trained with SpeechBrain (986a2175).
57
  To train it from scratch follows these steps:
 
58
  1. Clone SpeechBrain:
 
59
  ```bash
60
  git clone https://github.com/speechbrain/speechbrain/
61
  ```
 
62
  2. Install it:
 
63
  ```
64
  cd speechbrain
65
  pip install -r requirements.txt
66
  pip install -e .
67
  ```
 
68
  3. Run Training:
 
69
  ```
70
  cd recipes/CommonVoice/ASR/seq2seq
71
  python train.py hparams/train_de.yaml --data_folder=your_data_folder
72
  ```
 
73
  You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/13i7rdgVX7-qZ94Rtj6OdUgU-S6BbKKvw?usp=sharing)
 
74
  ### Limitations
 
75
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
 
76
  # **About SpeechBrain**
 
77
  - Website: https://speechbrain.github.io/
78
  - Code: https://github.com/speechbrain/speechbrain/
79
  - HuggingFace: https://huggingface.co/speechbrain/
 
80
  # **Citing SpeechBrain**
 
81
  Please, cite SpeechBrain if you use it for your research or business.
 
82
  ```bibtex
83
  @misc{speechbrain,
84
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
 
14
  - wer
15
  - cer
16
  ---
17
+
18
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
19
  <br/><br/>
20
+
21
  # CRDNN with CTC/Attention trained on CommonVoice 7.0 German (No LM)
22
  This repository provides all the necessary tools to perform automatic speech
23
  recognition from an end-to-end system pretrained on CommonVoice (German Language) within
24
  SpeechBrain. For a better experience, we encourage you to learn more about
25
  [SpeechBrain](https://speechbrain.github.io).
26
  The performance of the model is the following:
27
+
28
  | Release | Test CER | Test WER | GPUs |
29
  |:-------------:|:--------------:|:--------------:| :--------:|
30
+ | 28.10.21 | 4.93 | 15.37 | 1xV100 16GB |
31
+
32
  ## Credits
33
  The model is provided by [vitas.ai](vitas.ai).
34
+
35
  ## Pipeline description
36
  This ASR system is composed of 2 different but linked blocks:
37
+
38
  - Tokenizer (unigram) that transforms words into subword units and trained with
39
  the train transcriptions (train.tsv) of CommonVoice (DE).
40
  - Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
41
  N blocks of convolutional neural networks with normalization and pooling on the
42
  frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
43
  the final acoustic representation that is given to the CTC and attention decoders.
44
+
45
  ## Install SpeechBrain
46
  First of all, please install SpeechBrain with the following command:
47
+
48
  ```
49
  pip install speechbrain
50
  ```
51
+
52
  Please notice that we encourage you to read our tutorials and learn more about
53
  [SpeechBrain](https://speechbrain.github.io).
54
+
55
  ### Transcribing your own audio files (in German)
56
+
57
  ```python
58
  from speechbrain.pretrained import EncoderDecoderASR
59
  asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-crdnn-commonvoice-de", savedir="pretrained_models/asr-crdnn-commonvoice-de")
60
  asr_model.transcribe_file("speechbrain/asr-crdnn-commonvoice-de/example-de.wav")
61
  ```
62
+
63
  ### Inference on GPU
64
+
65
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
66
+
67
  ## Parallel Inference on a Batch
68
+
69
  Please, [see this Colab notebook](https://colab.research.google.com/drive/1hX5ZI9S4jHIjahFCZnhwwQmFoGAi3tmu?usp=sharing) to figure out how to transcribe in parallel a batch of input sentences using a pre-trained model.
70
+
71
  ### Training
72
+
73
  The model was trained with SpeechBrain (986a2175).
74
  To train it from scratch follows these steps:
75
+
76
  1. Clone SpeechBrain:
77
+
78
  ```bash
79
  git clone https://github.com/speechbrain/speechbrain/
80
  ```
81
+
82
  2. Install it:
83
+
84
  ```
85
  cd speechbrain
86
  pip install -r requirements.txt
87
  pip install -e .
88
  ```
89
+
90
  3. Run Training:
91
+
92
  ```
93
  cd recipes/CommonVoice/ASR/seq2seq
94
  python train.py hparams/train_de.yaml --data_folder=your_data_folder
95
  ```
96
+
97
  You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/13i7rdgVX7-qZ94Rtj6OdUgU-S6BbKKvw?usp=sharing)
98
+
99
  ### Limitations
100
+
101
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
102
+
103
  # **About SpeechBrain**
104
+
105
  - Website: https://speechbrain.github.io/
106
  - Code: https://github.com/speechbrain/speechbrain/
107
  - HuggingFace: https://huggingface.co/speechbrain/
108
+
109
  # **Citing SpeechBrain**
110
+
111
  Please, cite SpeechBrain if you use it for your research or business.
112
+
113
  ```bibtex
114
  @misc{speechbrain,
115
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},