speechbrain
/

asr-crdnn-commonvoice-14-en

Automatic Speech Recognition

speechbrain

PyTorch

English

CTC

Attention

Model card Files Files and versions Community

poonehmousavi commited on Aug 15, 2023

Commit

1b9351a

1 Parent(s): c6ca39f

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -12

README.md CHANGED Viewed

@@ -1,24 +1,27 @@
 ---
-language: "de"
-thumbnail:
 tags:
 - automatic-speech-recognition
 - CTC
 - Attention
 - pytorch
 - speechbrain
-license: "apache-2.0"
 datasets:
 - common_voice
 metrics:
-- wer
-- cer
 ---
 <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 <br/><br/>
-# CRDNN with CTC/Attention trained on CommonVoice 14.0 German (No LM)
 This repository provides all the necessary tools to perform automatic speech
 recognition from an end-to-end system pretrained on CommonVoice (German Language) within
 SpeechBrain. For a better experience, we encourage you to learn more about
@@ -27,7 +30,7 @@ The performance of the model is the following:
 | Release | Test CER | Test WER | GPUs |
 |:-------------:|:--------------:|:--------------:| :--------:|
-| 15.08.23 | 3.82 | 12.25 | 1xV100 16GB |
 ## Credits
 The model is provided by [vitas.ai](https://www.vitas.ai/).
@@ -36,7 +39,7 @@ The model is provided by [vitas.ai](https://www.vitas.ai/).
 This ASR system is composed of 2 different but linked blocks:
 - Tokenizer (unigram) that transforms words into subword units and trained with
-the train transcriptions (train.tsv) of CommonVoice (DE).
 - Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
 N blocks of convolutional neural networks with normalization and pooling on the
 frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
@@ -55,12 +58,12 @@ pip install speechbrain
 Please notice that we encourage you to read our tutorials and learn more about
 [SpeechBrain](https://speechbrain.github.io).
-### Transcribing your own audio files (in German)
 ```python
 from speechbrain.pretrained import EncoderDecoderASR
-asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/speechbrain/asr-crdnn-commonvoice-14-de", savedir="pretrained_models/speechbrain/asr-crdnn-commonvoice-14-de")
-asr_model.transcribe_file("speechbrain/speechbrain/asr-crdnn-commonvoice-14-de/example-de.wav")
 ```
 ### Inference on GPU
@@ -94,7 +97,7 @@ pip install -e .
 ```
 cd recipes/CommonVoice/ASR/seq2seq
-python train.py hparams/train_de.yaml --data_folder=your_data_folder
 ```
 You can find our training results (models, logs, etc) [here](https://www.dropbox.com/sh/zgatirb118f79ef/AACmjh-D94nNDWcnVI4Ef5K7a?dl=0)

 ---
+language:
+- en
+thumbnail: null
 tags:
 - automatic-speech-recognition
 - CTC
 - Attention
 - pytorch
 - speechbrain
+license: apache-2.0
 datasets:
 - common_voice
 metrics:
+  - name: Test WER
+    type: wer
+    value: ' 23.88'
 ---
 <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 <br/><br/>
+# CRDNN with CTC/Attention trained on CommonVoice 14.0 English (No LM)
 This repository provides all the necessary tools to perform automatic speech
 recognition from an end-to-end system pretrained on CommonVoice (German Language) within
 SpeechBrain. For a better experience, we encourage you to learn more about
 | Release | Test CER | Test WER | GPUs |
 |:-------------:|:--------------:|:--------------:| :--------:|
+| 15.08.23 | 12.76 | 23.88 | 1xV100 32GB |
 ## Credits
 The model is provided by [vitas.ai](https://www.vitas.ai/).
 This ASR system is composed of 2 different but linked blocks:
 - Tokenizer (unigram) that transforms words into subword units and trained with
+the train transcriptions (train.tsv) of CommonVoice (en).
 - Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
 N blocks of convolutional neural networks with normalization and pooling on the
 frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
 Please notice that we encourage you to read our tutorials and learn more about
 [SpeechBrain](https://speechbrain.github.io).
+### Transcribing your own audio files (in English)
 ```python
 from speechbrain.pretrained import EncoderDecoderASR
+asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/speechbrain/asr-crdnn-commonvoice-14-en", savedir="pretrained_models/speechbrain/asr-crdnn-commonvoice-14-en")
+asr_model.transcribe_file("speechbrain/speechbrain/asr-crdnn-commonvoice-14-en/example-en.wav")
 ```
 ### Inference on GPU
 ```
 cd recipes/CommonVoice/ASR/seq2seq
+python train.py hparams/train_en.yaml --data_folder=your_data_folder
 ```
 You can find our training results (models, logs, etc) [here](https://www.dropbox.com/sh/zgatirb118f79ef/AACmjh-D94nNDWcnVI4Ef5K7a?dl=0)